Title
Integrating Concepts and Knowledge in Large Content Networks.
Abstract
Large content networks like the World Wide Web contain huge amounts of information that have the potential of being integrated because their components fit within common concepts and/or are connected through hidden, implicit relationships. One attempt at such an integration is the program called the “Web of Data,” which is an evolution of the Semantic Web. It targets semi-structured information sources such as Wikipedia and turns them into fully structured ones in the form of Web-based databases like DBpedia and then integrates them with other public databases such as Geonames. On the other hand, the vast majority of the information residing on the Web is still totally unstructured, which is the starting point for our approach that aims to integrate unstructured information sources. For this purpose, we exploit techniques from Probabilistic Topic Modeling, in order to cluster Web pages into concepts (topics), which are then related through higher-level concept networks; we also make implicit semantic relationships emerge between single Web pages. The approach has been tested through a number of case studies that are here described. While the applicative focus of the research reported here is on knowledge integration on the specific and relevant case of the WWW, the wider aim is to provide a framework for integration generally applicable to all complex content networks where information propagates from multiple sources.
Year
DOI
Venue
2014
10.1007/s00354-014-0407-4
New Generation Comput.
Keywords
Field
DocType
Knowledge Integration,Semantic Analysis,Topic Modeling
World Wide Web,Web intelligence,Web page,Semantic Web Stack,Computer science,Web standards,Data Web,Semantic Web,Web modeling,Social Semantic Web
Journal
Volume
Issue
ISSN
32
3-4
0288-3635
Citations 
PageRank 
References 
2
0.48
20
Authors
4
Name
Order
Citations
PageRank
Marco Rossetti120.48
Remo Pareschi2601162.52
Fabio Stella316019.72
Francesca Arcelli Fontana450547.66