An i-Vector Representation of Acoustic Environments for Audio-Based Video Event Detection on User Generated Content - Citegraph

Paper Info

Title
An i-Vector Representation of Acoustic Environments for Audio-Based Video Event Detection on User Generated Content

Abstract
Audio-based video event detection (VED) on user-generated content (UGC) aims to find videos that show an observable event such as a wedding ceremony or birthday party rather than a sound, such as music, clapping or singing. The difficulty of video content analysis on UGC lies in the acoustic variability and lack of structure of the data. The UGC task has been explored mainly by computer vision, but can be benefited by the used of audio. The i-vector system is state-of-the-art in Speaker Verification, and is outperforming a conventional Gaussian Mixture Model (GMM)-based approach. The system compensates for undesired acoustic variability and extracts information from the acoustic environment, making it a meaningful choice for detection on UGC. This paper employs the i-vector-based system for audio-based VED on UGC and expands the understanding of the system on the task. It also includes a performance comparison with the conventional GMM-based and state-of-the-art Random Forest (RF)-based systems. The i-vector system aids audio-based event detection by addressing UGC audio characteristics. It outperforms the GMM-based system, and is competitive with the RF-based system in terms of the Missed Detection (MD) rate at 4% and 2.8% False Alarm (FA) rates, and complements the RF-based system by demonstrating slightly improvement in combination over the standalone systems.

Year	DOI	Venue
2013	10.1109/ISM.2013.27	Multimedia
Keywords	Field	DocType
i-vector-based system,gmm-based system,acoustic environments,audio-based video event detection,user generated content,acoustic variability,ugc task,ugc audio characteristic,i-vector system,acoustic environment,rf-based system,i-vector representation,standalone system,gaussian processes,speaker recognition,audio signal processing,mixture models	User-generated content,Computer vision,Object detection,False alarm,Computer science,Speech recognition,Video content analysis,Speaker recognition,Audio signal flow,Artificial intelligence,Audio signal processing,Mixture model	Conference
ISBN	Citations	PageRank
978-0-7695-5140-1	9	0.76
References	Authors
5	3

Authors (3 rows)

Cited by (9 rows)

References (5 rows)

Name	Order	Citations	PageRank
Benjamin Elizalde	1	359	22.38
Howard Lei	2	112	6.90
Gerald Friedland	3	1127	96.23

1