Title
Measuring semantic similarity between digital forensics terminologies using web search engines
Abstract
Semantic similarity between different terminologies is becoming a generic problem that extends across numerous domains, touching applications developed for computational linguistics, artificial intelligence, cognitive science and, in the case of this paper, digital forensics. Despite the usefulness of semantic similarity measures in different domains, accurately measuring semantic similarity between any two terms remains a challenging task. The main difficulty lies in developing a computational method with the ability to generate satisfactory results close to how human beings perceive these terminologies, especially when used in their domain of expertise. This paper presents a novel approach of using the Web to measure semantic similarity between two terms x and y in the digital forensics domain. The proposed approach is based on the Euclidean distance, a mathematical concept used to calculate the distance between two points. This paper also shows how computing the absolute value of the difference of the logarithms of the hit count percentages of any given terms x and y relates to the computed Euclidean distance of x and y. Percentages are computed from the total number of hit counts reported by any Web search engine for the search terms x, y and the logical x AND y together. Finally, these concepts are used to deduce a formula to automatically calculate a semantic similarity measure coined as the Digital Forensic Absolute Semantic Similarity Value of the terms x and y, denoted as DFASSV(x, y). Experiments conducted using the proposed DFASSV method focuses on the digital forensics domain. However, a comparison of the DFASSV approach with previously proposed Web-based semantic similarity measures shows that this approach is well suited for digital forensics domain terminologies. In the authors' opinion however, the DFASSV approach can be applied in other domains as well because it does not require any human-annotated knowledge. DFASSV is a novel approach to semanti- similarity measure and constitutes the main contribution of this paper.
Year
DOI
Venue
2012
10.1109/ISSA.2012.6320448
Information Security for South Africa
Keywords
Field
DocType
Internet,computer forensics,search engines,DFASSV x, y,Euclidean distance,Web search engines,artificial intelligence,cognitive science,computational linguistics,digital forensic absolute semantic similarity value terms x and y,digital forensics terminologies,logical x AND y,mathematical concept,semantic similarity measurement,Euclidean distance,Semantic similarity,Web,Web search engines,absolute value,digital forensic domain terminologies,digital forensics
Semantic similarity,Search engine,Semantic search,Computer forensics,Information retrieval,Digital forensics,Computer science,Computational linguistics,The Internet
Conference
ISBN
Citations 
PageRank 
978-1-4673-2160-0
0
0.34
References 
Authors
6
2
Name
Order
Citations
PageRank
Nickson M. Karie142.58
Hein S. Venter2588.01