Title
Fingerprinting Lexical Contexts over the Web
Abstract
In this paper a novel technique for identifying lexical contexts in web resources is presented. The basic idea is to consider web site anchortexts as lexicalized descriptions of an individual ontology organized in the form of a graph of concept words. In the search for peculiar semantic patterns, the concept of web minutia (transposed from the forensic domain) is introduced. The proposed technique consists in searching for web minutiae in the analyzed web sites by means of a golden ontology. Web minutiae act as fingerprints for context-specific web resources; in this sense they are a powerful computational tool to identify and categorize the Web. The WordNet database has been used as golden ontology for our experiments on English web documents. WordNet allows for indexing and retrieving word senses and inter-word taxonomical relations like hyponymy and hypernymy. It has proven to be an efficient mediator between web ontologies and context-dependent taxonomies. Our experiments have been carried out on a preliminary data set of several tens of thousand links taken by web sites of thirteen UK universities. Preliminary results seem to confirm the ability of web minutiae to identify lexical contexts across the Web.
Year
Venue
Keywords
2009
JOURNAL OF UNIVERSAL COMPUTER SCIENCE
minutia,golden ontology,Semantic Web,Web Mining,Knowledge Discovery,WordNet
Field
DocType
Volume
Data mining,World Wide Web,Web mining,Web intelligence,Information retrieval,Semantic Web Stack,Computer science,Web standards,Data Web,Semantic Web,Web modeling,Social Semantic Web
Journal
15
Issue
ISSN
Citations 
4
0948-695X
5
PageRank 
References 
Authors
0.45
28
3
Name
Order
Citations
PageRank
Vincenzo Di Lecce19417.49
Marco Calabrese2316.87
Domenico Soldo3192.90