Title
Ontology-driven web-based semantic similarity
Abstract
Estimation of the degree of semantic similarity/distance between concepts is a very common problem in research areas such as natural language processing, knowledge acquisition, information retrieval or data mining. In the past, many similarity measures have been proposed, exploiting explicit knowledge--such as the structure of a taxonomy--or implicit knowledge--such as information distribution. In the former case, taxonomies and/or ontologies are used to introduce additional semantics; in the latter case, frequencies of term appearances in a corpus are considered. Classical measures based on those premises suffer from some problems: in the first case, their excessive dependency of the taxonomical/ontological structure; in the second case, the lack of semantics of a pure statistical analysis of occurrences and/or the ambiguity of estimating concept statistical distribution from term appearances. Measures based on Information Content (IC) of taxonomical concepts combine both approaches. However, they heavily depend on a properly pre-tagged and disambiguated corpus according to the ontological entities in order to compute accurate concept appearance probabilities. This limits the applicability of those measures to other ontologies ---like specific domain ontologies- and massive corpus ---like the Web-. In this paper, several of the presented issues are analyzed. Modifications of classical similarity measures are also proposed. They are based on a contextualized and scalable version of IC computation in the Web by exploiting taxonomical knowledge. The goal is to avoid the measures' dependency on the corpus pre-processing to achieve reliable results and minimize language ambiguity. Our proposals are able to outperform classical approaches when using the Web for estimating concept probabilities.
Year
DOI
Venue
2010
10.1007/s10844-009-0103-x
J. Intell. Inf. Syst.
Keywords
Field
DocType
Semantic similarity,Ontologies,Information content,Web,Knowledge discovery
Data mining,Ontology,Computer science,Artificial intelligence,Natural language processing,Web application,Ambiguity,Semantic similarity,Ontology (information science),Information retrieval,Knowledge extraction,Knowledge acquisition,Machine learning,Semantics
Journal
Volume
Issue
ISSN
35
3
0925-9902
Citations 
PageRank 
References 
48
1.57
26
Authors
4
Name
Order
Citations
PageRank
David Sánchez169033.01
Montserrat Batet289937.20
Aida Valls356120.52
Karina Gibert428134.01