Title
Combining Word Embeddings with Taxonomy Information for Multi-Label Document Classification.
Abstract
In business contexts, documents often need to be classified using company-specific taxonomies. Text-classification approaches based on word embeddings have become increasingly popular as they enable words, documents, and tags to be represented in a semantically robust way (as distributed representations of their contexts) and make documents and tags processable in an algebraic vector space. However, these distributed representations of contexts have their shortcomings when used for multi-label classification tasks: the more similar the contexts of two tags, the more difficult they are to separate in classification. Intensified by poor training data, poor training, or inherent limitations of the word-embedding approach, in practice, we find areas of indistinguishability, leading to false positive predictions (typically in leaf tags of a taxonomy tree). We contribute an approach to tackle the problem of indistinguishable areas for multi-label classification tasks based on word embeddings by including taxonomy information during prediction.
Year
DOI
Venue
2019
10.1145/3342558.3345424
DocEng
Keywords
Field
DocType
multi-label document classification, text tagging, keyword identification, word embeddings, taxonomy
Document classification,Information retrieval,Computer science
Conference
ISBN
Citations 
PageRank 
978-1-4503-6887-2
0
0.34
References 
Authors
0
2
Name
Order
Citations
PageRank
Stefan Hirschmeier101.35
Detlef Schoder235046.46