Ontology-driven web-based semantic similarity - Citegraph

Paper Info

Title
Ontology-driven web-based semantic similarity

Abstract
Estimation of the degree of semantic similarity/distance between concepts is a very common problem in research areas such as natural language processing, knowledge acquisition, information retrieval or data mining. In the past, many similarity measures have been proposed, exploiting explicit knowledge--such as the structure of a taxonomy--or implicit knowledge--such as information distribution. In the former case, taxonomies and/or ontologies are used to introduce additional semantics; in the latter case, frequencies of term appearances in a corpus are considered. Classical measures based on those premises suffer from some problems: in the first case, their excessive dependency of the taxonomical/ontological structure; in the second case, the lack of semantics of a pure statistical analysis of occurrences and/or the ambiguity of estimating concept statistical distribution from term appearances. Measures based on Information Content (IC) of taxonomical concepts combine both approaches. However, they heavily depend on a properly pre-tagged and disambiguated corpus according to the ontological entities in order to compute accurate concept appearance probabilities. This limits the applicability of those measures to other ontologies ---like specific domain ontologies- and massive corpus ---like the Web-. In this paper, several of the presented issues are analyzed. Modifications of classical similarity measures are also proposed. They are based on a contextualized and scalable version of IC computation in the Web by exploiting taxonomical knowledge. The goal is to avoid the measures' dependency on the corpus pre-processing to achieve reliable results and minimize language ambiguity. Our proposals are able to outperform classical approaches when using the Web for estimating concept probabilities.

Year	DOI	Venue
2010	10.1007/s10844-009-0103-x	J. Intell. Inf. Syst.
Keywords	Field	DocType
Semantic similarity,Ontologies,Information content,Web,Knowledge discovery	Data mining,Ontology,Computer science,Artificial intelligence,Natural language processing,Web application,Ambiguity,Semantic similarity,Ontology (information science),Information retrieval,Knowledge extraction,Knowledge acquisition,Machine learning,Semantics	Journal
Volume	Issue	ISSN
35	3	0925-9902
Citations	PageRank	References
48	1.57	26
Authors
4

Authors (4 rows)

Cited by (48 rows)

References (26 rows)

Name	Order	Citations	PageRank
David Sánchez	1	690	33.01
Montserrat Batet	2	899	37.20
Aida Valls	3	561	20.52
Karina Gibert	4	281	34.01

1