Abstract | ||
---|---|---|
The massive amount of multimedia information currently available through the Internet demands efficient techniques to extract knowledge from Big Data. In this work, we propose an architecture to capture, process, analyse and visualize data coming from multiple streaming multimedia TV stations and radio stations. For that, we rely on the Hadoop framework available within the IBM InfoSphere BigInsights platform. We create a workflow to automate the different stages that range from Automatic Speech Recognition using open-source tools to visualization by means of the R framework. We emphasize techniques such as diarization and the optimization of the number of Hadoop nodes, provisioned from Cloud infrastructures, to deliver enhanced performance. The results show that it is possible to automate knowledge extraction from multimedia data running on virtualized infrastructures by means of Big Data techniques. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1109/PDP.2016.45 | 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) |
Keywords | Field | DocType |
Big Data,Hadoop,MapReduce,multimedia,R,BigInsights | Data visualization,Visualization,Computer science,Infosphere,Knowledge extraction,Big data,Multimedia,Workflow,The Internet,Distributed computing,Cloud computing | Conference |
ISSN | Citations | PageRank |
1066-6192 | 1 | 0.34 |
References | Authors | |
8 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jose Herrera | 1 | 1 | 0.34 |
Germán Moltó | 2 | 171 | 18.92 |