Title | ||
---|---|---|
End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-character Recognition Model |
Abstract | ||
---|---|---|
Time-aligned lyrics can enrich the music listening experience by enabling karaoke, text-based song retrieval and intra-song navigation, and other applications. Compared to text-to-speech alignment, lyrics alignment remains highly challenging, despite many attempts to combine numerous sub-modules including vocal separation and detection in an effort to break down the problem. Furthermore, training required fine-grained annotations to be available in some form. Here, we present a novel system based on a modified Wave-U-Net architecture, which predicts character probabilities directly from raw audio using learnt multi-scale representations of the various signal components. There are no sub-modules whose interdependencies need to be optimized. Our training procedure is designed to work with weak, line-level annotations available in the real world. With a mean alignment error of 0.35s on a standard dataset our system outperforms the state-of-the-art by an order of magnitude. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/ICASSP.2019.8683470 | ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Keywords | Field | DocType |
Lyrics alignment,multi-scale representation,neural networks,CTC training,lyrics transcription | Architecture,Character recognition,Pattern recognition,Computer science,End-to-end principle,Active listening,Speech recognition,Raw audio format,Artificial intelligence,Lyrics,Polyphony,Artificial neural network | Journal |
Volume | ISSN | ISBN |
abs/1902.06797 | 1520-6149 | 978-1-4799-8131-1 |
Citations | PageRank | References |
4 | 0.49 | 8 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Daniel Stoller | 1 | 18 | 3.55 |
Simon Durand | 2 | 25 | 3.02 |
Sebastian Ewert | 3 | 386 | 27.29 |