Title
Learning to Order Terms: Supervised Interestingness Measures in Terminology Extraction
Abstract
� Abstract— Term Extraction, a key data preparation step in Text Mining, extracts the terms, i.e. relevant collocation of words, attached to specific concepts (e.g. genetic-algorithms and decision- trees are terms associated to the concept "Machine Learning" ). In this paper, the task of extracting interesting collocations is achieved through a supervised learning algorithm, exploiting a few collocations manually labelled as interesting/not interesting. From these examples, the ROGER algorithm learns a numerical function, inducing some ranking on the collocations. This ranking is optimized using genetic algorithms, maximizing the trade-off between the false positive and true positive rates (Area Under the ROC curve). This approach uses a particular representation for the word collocations, namely the vector of values corresponding to the standard statistical interestingness measures attached to this collocation. As this representation is general (over corpora and natural languages), generality tests were performed by experimenting the ranking function learned from an English corpus in Biology, onto a French corpus of Curriculum Vitae, and vice versa, showing a good robustness of the approaches compared to the state-of-the-art Support Vector Machine (SVM).
Year
Venue
Keywords
2004
International Conference on Computational Intelligence
terminology extraction,evolutionary algorithm,roc curve.,— text-mining,supervised learning,support vector machine,false positive,roc curve,machine learning,decision tree,genetic algorithm,text mining
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
20
4
Name
Order
Citations
PageRank
Jérôme Azé17315.66
Mathieu Roche29624.74
Yves Kodratoff3581172.25
Michèle Sebag41547138.94