Abstract | ||
---|---|---|
Document retrieval on natural languages with a rich morphology -- particularly in terms of derivation and (single-word) composition -- suffers from serious performance degradation with the direct query-term-to-text-word matching paradigm that underlies the vast majority of current search engines. We propose an alternative approach in which morphologically complex word forms, which appear in the query as well as in the documents, are segmented into relevant subwords (such as stems, named entities, acronyms) and are subsequently submitted to the matching procedure. We evaluate our approach with the Alta Vista驴 Search Engine on a large medical document collection. |
Year | DOI | Venue |
---|---|---|
2001 | 10.1007/3-540-44816-0_8 | IDA |
Keywords | Field | DocType |
complex word form,search engine,direct query-term-to-text-word,alta vista,large medical document collection,current search engine,matching procedure,alternative approach,natural language,document retrieval,morphologically complex languages | Search engine,Computer science,Natural language,Natural language processing,Artificial intelligence,Document retrieval | Conference |
ISBN | Citations | PageRank |
3-540-42581-0 | 0 | 0.34 |
References | Authors | |
10 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Udo Hahn | 1 | 937 | 88.14 |
Martin Honeck | 2 | 21 | 2.62 |
Stefan Schulz | 3 | 6 | 1.38 |