Abstract | ||
---|---|---|
E-commerce catalogs include a continuously growing number of products that are constantly updated. Each item in a catalog is characterized by several attributes and identified by a taxonomy label. Categorizing products with their taxonomy labels is fundamental to effectively search and organize listings in a catalog. However, manual and/or rule based approaches to categorization are not scalable. In this paper, we compare several classifiers to product taxonomy categorization of toplevel categories. We first investigate a number of feature sets and observe that a combination of word unigrams from product names and navigational breadcrumbs work best for categorization. Secondly, we apply correspondence topic models to detect noisy data and introduce a lightweight manual process to improve dataset quality. Finally, we evaluate linear models, gradient boosted trees (GBTs) and convolutional neural networks (CNNs) with pre-trained word embeddings demonstrating that, compared to other baselines, GBTs and CNNs yield the highest gains in error reduction. |
Year | Venue | Keywords |
---|---|---|
2016 | 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | Categorization,Taxonomy label,CNNs |
Field | DocType | Citations |
Data mining,Data modeling,Categorization,Rule-based system,Computer science,Convolutional neural network,Artificial intelligence,Topic model,Product classification,Artificial neural network,Machine learning,Scalability | Conference | 1 |
PageRank | References | Authors |
0.36 | 19 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Pradipto Das | 1 | 105 | 6.43 |
Yandi Xia | 2 | 1 | 0.36 |
Aaron Levine | 3 | 1 | 0.36 |
Giuseppe Di Fabbrizio | 4 | 330 | 44.45 |
Anupam Datta | 5 | 1617 | 87.21 |