Abstract | ||
---|---|---|
We present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from a text corpus. The approach is based on Formal Concept Analysis (FCA), a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. We follow Harris' distributional hypothesis and model the context of a certain term as a vector representing syntactic dependencies which are automatically acquired from the text corpus with a linguistic parser. On the basis of this context information, FCA produces a lattice that we convert into a special kind of partial order constituting a concept hierarchy. The approach is evaluated by comparing the resulting concept hierarchies with hand-crafted taxonomies for two domains: tourism and finance. We also directly compare our approach with hierarchical agglomerative clustering as well as with Bi-Section-KMeans as an instance of a divisive clustering algorithm. Furthermore, we investigate the impact of using different measures weighting the contribution of each attribute as well as of applying a particular smoothing technique to cope with data sparseness. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1613/jair.1648 | Journal of Artificial Intelligence Research |
Keywords | DocType | Volume |
resulting concept hierarchy,automatic acquisition,novel approach,concept hierarchy,divisive clustering algorithm,text corpus,context information,data sparseness,formal concept analysis,certain term,artificial intelligent,partial order | Journal | abs/1109.2140 |
Issue | ISSN | Citations |
1 | Journal Of Artificial Intelligence Research, Volume 24, pages
305-339, 2005 | 260 |
PageRank | References | Authors |
10.17 | 36 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Philipp Cimiano | 1 | 3338 | 217.41 |
Andreas Hotho | 2 | 3232 | 210.84 |
Steffen Staab | 3 | 6658 | 593.89 |