Title
HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset.
Abstract
Abstract This work is a detailed companion reproducibility paper of the methods experiments proposed by Lastra-Diaz Garcia-Serrano in (2015, 2016) [56–58], which introduces the following contributions: (1) a new efficient representation model for taxonomies, called PosetHERep , which is an adaptation of the half-edge data structure commonly used to represent discrete manifolds planar graphs; (2) a new Java software library called the Half-Edge Semantic Measures Library ( HESML) based on PosetHERep , which implements most ontology-based semantic similarity measures Information Content (IC) models reported in the literature; (3) a set of reproducible experiments on word similarity based on HESML and ReproZip with the aim of exactly reproducing the experimental surveys in the three aforementioned works; (4) a replication framework dataset, called WNSimRep v1 , whose aim is to assist the exact replication of most methods reported in the literature; finally, (5) a set of scalability performance benchmarks for semantic measures libraries. PosetHERep HESML are motivated by several drawbacks in the current semantic measures libraries, especially the performance scalability, as well as the evaluation of new methods the replication of most previous methods. The reproducible experiments introduced herein are encouraged by the lack of a set of large, self-contained easily reproducible experiments with the aim of replicating confirming previously reported results. Likewise, the WNSimRep v1 dataset is motivated by the discovery of several contradictory results difficulties in reproducing previously reported methods experiments. PosetHERep proposes a memory-efficient representation for taxonomies which linearly scales with the size of the taxonomy provides an efficient implementation of most taxonomy-based algorithms used by the semantic measures IC models, whilst HESML provides an open framework to aid research into the area by providing a simpler more efficient software architecture than the current software libraries. Finally, we prove the outperformance of HESML on the state-of-the-art libraries, as well as the possibility of significantly improving their performance scalability without caching using PosetHERep .
Year
Venue
Field
2017
Inf. Syst.
Ontology,Data mining,Computer science,Software,Artificial intelligence,Semantic similarity,Data structure,Information retrieval,Software architecture,Java,Planar graph,Machine learning,Database,Scalability
DocType
Volume
Citations 
Journal
66
6
PageRank 
References 
Authors
0.41
71
5
Name
Order
Citations
PageRank
Juan J. Lastra-Díaz1242.62
Ana M. García-Serrano25912.14
Montserrat Batet389937.20
Miriam Fernández415811.57
Fernando Seabra Chirigati520516.38