Title
Morphological lexicon extraction from raw text data
Abstract
The tool extract enables the automatic extraction of lemma-paradigm pairs from raw text data. The tool uses search patterns that consist of regular expressions and propositional logic. These search patterns define sufficient conditions for including lemma-paradigm pairs in the lexicon, on the basis of word forms occurring in the data. This paper explains the search pattern syntax of extract as well as the search algorithm, and discusses the design of search patterns from the recall and precision point of view. The extract tool was developed for morphologies defined in the Functional Morphology tool [1], but it is usable for all systems that implement a word-and-paradigm description of a morphology. The usefulness of the tool is demonstrated by a case study on the Canadian Hansards Corpus of French. The result is evaluated in terms of precision of the extracted lemmas and statistics on coverage and rule productiveness. Competitive extraction figures show that human-written rules in a tailored tool is a time-efficient approach to the task at hand.
Year
DOI
Venue
2006
10.1007/11816508_49
FinTAL
Keywords
Field
DocType
competitive extraction figure,tailored tool,functional morphology tool,automatic extraction,search algorithm,raw text data,morphological lexicon extraction,tool extract,extract tool,search pattern,lemma-paradigm pair,search pattern syntax,propositional logic,regular expression
Regular expression,Search algorithm,Computer science,Precision and recall,Propositional calculus,Natural language,Lexicon,Natural language processing,Artificial intelligence,Syntax,Design pattern
Conference
Volume
ISSN
ISBN
4139
0302-9743
3-540-37334-9
Citations 
PageRank 
References 
15
1.06
7
Authors
3
Name
Order
Citations
PageRank
Markus Forsberg111115.77
Harald Hammarström2797.78
Aarne Ranta331636.02