Deep Learning for Extreme Multi-label Text Classification - Citegraph

Paper Info

Title
Deep Learning for Extreme Multi-label Text Classification

Abstract
Extreme multi-label text classification (XMTC) refers to the problem of assigning to each document its most relevant subset of class labels from an extremely large label collection, where the number of labels could reach hundreds of thousands or millions. The huge label space raises research challenges such as data sparsity and scalability. Significant progress has been made in recent years by the development of new machine learning methods, such as tree induction with large-margin partitions of the instance spaces and label-vector embedding in the target space. However, deep learning has not been explored for XMTC, despite its big successes in other related areas. This paper presents the first attempt at applying deep learning to XMTC, with a family of new Convolutional Neural Network (CNN) models which are tailored for multi-label classification in particular. With a comparative evaluation of 7 state-of-the-art methods on 6 benchmark datasets where the number of labels is up to 670,000, we show that the proposed CNN approach successfully scaled to the largest datasets, and consistently produced the best or the second best results on all the datasets. On the Wikipedia dataset with over 2 million documents and 500,000 labels in particular, it outperformed the second best method by 11.7%~15.3% in precision@K and by 11.5%~11.7% in NDCG@K for K = 1,3,5.

Year	DOI	Venue
2017	10.1145/3077136.3080834	SIGIR
Field	DocType	ISBN
Learning to rank,Data mining,Embedding,Information retrieval,Convolutional neural network,Computer science,Artificial intelligence,Deep learning,Machine learning,Scalability	Conference	978-1-4503-5022-8
Citations	PageRank	References
45	1.03	36
Authors
4

Authors (4 rows)

Cited by (45 rows)

References (36 rows)

Name	Order	Citations	PageRank
Jingzhou Liu	1	66	3.73
Wei-Cheng Chang	2	169	9.94
Yuexin Wu	3	99	5.78
Yiming Yang	4	5390	500.59

1