Title
"Hello? Who Am I Talking To?" A Shallow Cnn Approach For Human Vs. Bot Speech Classification
Abstract
Automatic speech generation algorithms, enhanced by deep learning techniques, enable an increasingly seamless and immediate machine-to-human interaction. As a result, the latest generation of phone-calling bots sounds more convincingly human than previous generations. The application of this technology has a strong social impact in terms of privacy issues (e.g., in customer-care services), fraudulent actions (e.g., social hacking) and erosion of trust (e.g., generation of fake conversation). For these reasons, it is crucial to identify the nature of a speaker, as either a human or a bot. In this paper, we propose a speech classification algorithm based on Convolutional Neural Networks (CNNs), which enables the automatic classification of human vs non-human speakers from the analysis of short audio excerpts. We evaluate the effectiveness of the proposed solution by exploiting a real human speech database populated with audio recordings from various sources, and automatically generated speeches using state-of-the-art text-to-speech generators based on deep learning (e.g., Google WaveNet).
Year
DOI
Venue
2019
10.1109/icassp.2019.8682743
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
Keywords
Field
DocType
Audio forensics, convolutional neural network, speaker detection
Automatic speech,Conversation,Pattern recognition,Convolutional neural network,Computer science,Spectrogram,Speech recognition,Hacker,Speech classification,Artificial intelligence,Deep learning,Hidden Markov model
Conference
ISSN
Citations 
PageRank 
1520-6149
0
0.34
References 
Authors
0
7
Name
Order
Citations
PageRank
A. Lieto100.34
D. Moro200.34
F. Devoti300.34
C. Parera400.34
Vincenzo Lipari501.69
Paolo Bestagini626132.01
Stefano Tubaro71033119.50