Learning to Order Terms: Supervised Interestingness Measures in Terminology Extraction - Citegraph

Paper Info

Title
Learning to Order Terms: Supervised Interestingness Measures in Terminology Extraction

Abstract
� Abstract— Term Extraction, a key data preparation step in Text Mining, extracts the terms, i.e. relevant collocation of words, attached to specific concepts (e.g. genetic-algorithms and decision- trees are terms associated to the concept "Machine Learning" ). In this paper, the task of extracting interesting collocations is achieved through a supervised learning algorithm, exploiting a few collocations manually labelled as interesting/not interesting. From these examples, the ROGER algorithm learns a numerical function, inducing some ranking on the collocations. This ranking is optimized using genetic algorithms, maximizing the trade-off between the false positive and true positive rates (Area Under the ROC curve). This approach uses a particular representation for the word collocations, namely the vector of values corresponding to the standard statistical interestingness measures attached to this collocation. As this representation is general (over corpora and natural languages), generality tests were performed by experimenting the ranking function learned from an English corpus in Biology, onto a French corpus of Curriculum Vitae, and vice versa, showing a good robustness of the approaches compared to the state-of-the-art Support Vector Machine (SVM).

Year	Venue	Keywords
2004	International Conference on Computational Intelligence	terminology extraction,evolutionary algorithm,roc curve.,— text-mining,supervised learning,support vector machine,false positive,roc curve,machine learning,decision tree,genetic algorithm,text mining
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
20	4

Authors (4 rows)

Cited by (0 rows)

References (20 rows)

Name	Order	Citations	PageRank
Jérôme Azé	1	73	15.66
Mathieu Roche	2	96	24.74
Yves Kodratoff	3	581	172.25
Michèle Sebag	4	1547	138.94

1