Abstract | ||
---|---|---|
In this paper we define the document phrase maximality index DPM-index, a new measure to discriminate overlapping keyphrase candidates in a text document. As an application we developed a supervised learning system that uses 18 statistical features, among them the DPM-index and five other new features. We experimentally compared our results with those of 21 keyphrase extraction methods on SemEval-2010/Task-5 scientific articles corpus. When all the systems extract 10 keyphrases per document, our method enhances by 13% the F-score of the best system. In particular, the DPM-index feature increases the F-score of our keyphrase extraction system by a rate of 9%. This makes the DPM-index contribution comparable to that of the well-known TFIDF measure on such a system. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1177/0165551514530210 | Journal of Information Science |
Keywords | Field | DocType |
information extraction,keyphrase extraction,scientific digital libraries,text mining | Text mining,Information retrieval,Computer science,Phrase,Information extraction,Text document | Journal |
Volume | Issue | ISSN |
40 | 4 | 0165-5515 |
Citations | PageRank | References |
5 | 0.42 | 44 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mounia Haddoud | 1 | 18 | 1.66 |
Saïd Abdeddaïm | 2 | 5 | 0.76 |