Title
Large-Scale Taxonomy Categorization For Noisy Product Listings
Abstract
E-commerce catalogs include a continuously growing number of products that are constantly updated. Each item in a catalog is characterized by several attributes and identified by a taxonomy label. Categorizing products with their taxonomy labels is fundamental to effectively search and organize listings in a catalog. However, manual and/or rule based approaches to categorization are not scalable. In this paper, we compare several classifiers to product taxonomy categorization of toplevel categories. We first investigate a number of feature sets and observe that a combination of word unigrams from product names and navigational breadcrumbs work best for categorization. Secondly, we apply correspondence topic models to detect noisy data and introduce a lightweight manual process to improve dataset quality. Finally, we evaluate linear models, gradient boosted trees (GBTs) and convolutional neural networks (CNNs) with pre-trained word embeddings demonstrating that, compared to other baselines, GBTs and CNNs yield the highest gains in error reduction.
Year
Venue
Keywords
2016
2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)
Categorization,Taxonomy label,CNNs
Field
DocType
Citations 
Data mining,Data modeling,Categorization,Rule-based system,Computer science,Convolutional neural network,Artificial intelligence,Topic model,Product classification,Artificial neural network,Machine learning,Scalability
Conference
1
PageRank 
References 
Authors
0.36
19
5
Name
Order
Citations
PageRank
Pradipto Das11056.43
Yandi Xia210.36
Aaron Levine310.36
Giuseppe Di Fabbrizio433044.45
Anupam Datta5161787.21