Title
SABINE: A Multi-purpose Dataset of Semantically-Annotated Social Content.
Abstract
Social Business Intelligence (SBI) is the discipline that combines corporate data with social content to let decision makers analyze the trends perceived from the environment. SBI poses research challenges in several areas, such as IR, data mining, and NLP; unfortunately, SBI research is often restrained by the lack of publicly-available, real-world data for experimenting approaches, and by the difficulties in determining a ground truth. To fill this gap we present SABINE, a modular dataset in the domain of European politics. SABINE includes 6 millions bilingual clips crawled from 50 000 web sources, each associated with metadata and sentiment scores; an ontology with 400 topics, their occurrences in the clips, and their mapping to DBpedia; two multidimensional cubes for analyzing and aggregating sentiment and semantic occurrences. We also propose a set of research challenges that can be addressed using SABINE; remarkably, the presence of an expert-validated ground truth ensures the possibility of testing approaches to the whole SBI process as well as to each single task.
Year
DOI
Venue
2018
10.1007/978-3-030-00668-6_5
Lecture Notes in Computer Science
Keywords
Field
DocType
Dataset,Social technologies,Sentiment analysis,Text analysis
Data mining,Metadata,Ontology,Information retrieval,Computer science,Sentiment analysis,Ground truth,Modular design,Social business
Conference
Volume
ISSN
Citations 
11137
0302-9743
0
PageRank 
References 
Authors
0.34
10
8
Name
Order
Citations
PageRank
S. Castano141.02
Alfio Ferrara271059.86
Enrico Gallinucci3518.60
Matteo Golfarelli4113495.24
Stefano Montanelli542242.17
Lorenzo Mosca671.58
Stefano Rizzi71488111.52
Cristian Vaccari8152.32