Title
A practical experience concerning the parallel semantic annotation of a large-scale data collection
Abstract
From a computational point of view, the semantic annotation of large-scale data collections is an extremely expensive task. One possible way of dealing with this drawback is to distribute the execution of the annotation algorithm in several computing environments. In this paper, we show how the problem of semantically annotating a large-scale collection of learning objects has been conducted. The terms related to each learning object have been processed. The output was an RDF graph computed from the DBpedia database. According to an initial study, the use of a sequential implementation of the annotation algorithm would require more than 1600 CPU-years to deal with the whole set of learning objects (about 15 millions). For this reason, a framework able to integrate a set of heterogeneous computing infrastructures has been used to execute a new parallel version of the algorithm. As a result, the problem was solved in 178 days.
Year
DOI
Venue
2013
10.1145/2506182.2506191
I-SEMANTICS
Keywords
Field
DocType
computational point,heterogeneous computing infrastructure,practical experience,large-scale collection,dbpedia database,computing environment,semantic annotation,whole set,annotation algorithm,parallel semantic annotation,rdf graph,large-scale data collection
Drawback,Data collection,Data mining,Annotation,Semantic annotation,Information retrieval,Computer science,Symmetric multiprocessor system,Learning object,Rdf graph
Conference
Citations 
PageRank 
References 
1
0.37
19
Authors
6
Name
Order
Citations
PageRank
Javier Fabra15310.12
Sergio Hernández261.80
Pedro Álvarez35711.56
Estefanía Otero410.37
Juan Carlos Vidal5639.75
Manuel Lama638334.84