Title | ||
---|---|---|
A convolutional neural network approach for gender and language variety identification. |
Abstract | ||
---|---|---|
We present a method for gender and language variety identification using a convolutional neural network (CNN). We compare the performance of this method with a traditional machine learning algorithm - support vector machines (SVM) trained on character n-grams (n = 3-8) and lexical features (unigrams and bigrams of words), and their combinations. We use a single multi-labeled corpus composed of news articles in different varieties of Spanish developed specifically for these tasks. We present a convolutional neural network trained on word- and sentence-level embeddings architecture that can be successfully applied to gender and language variety identification on a relatively small corpus (less than 10,000 documents). Our experiments show that the deep learning approach outperforms a traditional machine learning approach on both tasks, when named entities are present in the corpus. However, when evaluating the performance of these approaches reducing all named entities to a single symbol "NE" to avoid topic-dependent features, the drop in accuracy is higher for the deep learning approach. |
Year | DOI | Venue |
---|---|---|
2019 | 10.3233/JIFS-179032 | JOURNAL OF INTELLIGENT & FUZZY SYSTEMS |
Keywords | Field | DocType |
Convolutional neural networks,deep learning,author profiling,gender identification,language variety identification,machine learning,character n-grams,Spanish | Convolutional neural network,Gender and Language,Artificial intelligence,Deep learning,Machine learning,Mathematics | Journal |
Volume | Issue | ISSN |
36 | SP5 | 1064-1246 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Helena Gómez-Adorno | 1 | 40 | 16.01 |
Roddy Fuentes-Alba | 2 | 0 | 0.34 |
ilia markov | 3 | 5 | 5.23 |
Grigori Sidorov | 4 | 398 | 60.51 |
Alexander Gelbukh | 5 | 2843 | 269.19 |