A Corpus Search System Utilizing Lexical Dependency Structure
This paper presents a corpus search system utilizing lexical dependency structure. The user's query consists of a sequence of keywords. For a given query, the system automatically generates the dependency structure patterns which consist of keywords in the query, and returns the sentences whose dependency structures match the generated patterns. The dependency structure patterns are generated by using two operations: combining and interpolation, which utilize dependency structures in the searched corpus. The operations enable the system to generate only the dependency structure patterns that occur in the corpus. The system achieves simple and intuitive corpus search and it is enough linguistically sophisticated to utilize structural information. Several corpus search systems have been presented. Most systems provide keyword-based search functionality. The search is simple and intuitive, but not enough linguistically sophisticated to utilize structural information. On the other hand, (Corley et al., 2001) and (Resnik and Elkiss, 2005) have presented corpus search systems utiliz- ing syntactic structure, Gsearch and Linguist's Search En- gine (LSE), respectively. These systems can search cor- pora by using phrase structure patterns. In the Gsearch, the user gives a phrase structure pattern and a grammar to the system. The system constructs parse trees of the sen- tences in the corpus by using the given grammar, and re- turns the sentences whose parse trees match the given pat- tern. In the LSE, the user first gives an example of sen- tences which he/she needs. The system parses the example by using a statistical parser and returns the parsing result. The user edits the resulting parse tree to specify a structural query. The system finally returns the sentences whose parse trees match the structural query. The Gsearch and LSE can search corpora by utilizing syntactic information. However, they do not achieve simple search like keyword-based sys- tems. This paper presents a corpus search system which auto- matically generates structural queries from keyword-based queries. The system searches corpora based on lexical de- pendency information. The user's query is a sequence of keywords. For a given query, it generates dependency struc- ture patterns by using two operations: combining and inter- polation. The user need neither to build a grammar like the Gsearch nor to edit structural query like the LSE, because of the automatic pattern generation. The system achieves simple and intuitive corpus search and it is enough to lin- guistically sophisticated to utilize structural information. 2. Corpus Search based on Dependency Structure
Computer science,Interpolation,Speech recognition,Dependency structure,Lexical functional grammar,Natural language processing,Artificial intelligence
Yoshihide Kato1228.15
Shigeki Matsubara217943.41
Yasuyoshi Inagaki324344.27