Title | ||
---|---|---|
Rhythm Based Music Segmentation And Octave Scale Cepstral Features For Sung Language Recognition |
Abstract | ||
---|---|---|
Sung language recognition relies on both effective feature extraction and acoustic modeling. In this paper, we study rhythm based music segmentation in which the frame size varies in proportion to inter-beat interval of the music, in contrast to fixed length segmentation (FIX) in spoken language recognition. We show that acoustic feature extracted from the BSS scheme outperforms that from FIX. We also compare the effectiveness of musically motivated acoustic features, Octave scale cepstral coefficients (OSCCs) with Log frequency cepstral coefficients. We adopt Gaussian mixture model for sung language classifier design. Experiments are conducted on a database of 400 popular songs sung in four languages, including English, Chinese, German and Indonesian, which show that OSCC feature outperforms other features. We achieve 64.9% of sung language identification accuracy with Gaussian mixture models trained on shifted-delta-cepstral OSCC acoustic features extracted via BSS. |
Year | Venue | Keywords |
---|---|---|
2008 | INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 | Sung Language, Octave scale, vocal detection |
Field | DocType | Citations |
Octave,Segmentation,Computer science,Cepstrum,Speech recognition,Language recognition,Rhythm | Conference | 0 |
PageRank | References | Authors |
0.34 | 1 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Namunu Chinthaka Maddage | 1 | 108 | 11.28 |
Haizhou Li | 2 | 3678 | 334.61 |