Large-Scale Taxonomy Categorization For Noisy Product Listings - Citegraph

Paper Info

Title
Large-Scale Taxonomy Categorization For Noisy Product Listings

Abstract
E-commerce catalogs include a continuously growing number of products that are constantly updated. Each item in a catalog is characterized by several attributes and identified by a taxonomy label. Categorizing products with their taxonomy labels is fundamental to effectively search and organize listings in a catalog. However, manual and/or rule based approaches to categorization are not scalable. In this paper, we compare several classifiers to product taxonomy categorization of toplevel categories. We first investigate a number of feature sets and observe that a combination of word unigrams from product names and navigational breadcrumbs work best for categorization. Secondly, we apply correspondence topic models to detect noisy data and introduce a lightweight manual process to improve dataset quality. Finally, we evaluate linear models, gradient boosted trees (GBTs) and convolutional neural networks (CNNs) with pre-trained word embeddings demonstrating that, compared to other baselines, GBTs and CNNs yield the highest gains in error reduction.

Year	Venue	Keywords
2016	2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)	Categorization,Taxonomy label,CNNs
Field	DocType	Citations
Data mining,Data modeling,Categorization,Rule-based system,Computer science,Convolutional neural network,Artificial intelligence,Topic model,Product classification,Artificial neural network,Machine learning,Scalability	Conference	1
PageRank	References	Authors
0.36	19	5

Authors (5 rows)

Cited by (1 rows)

References (19 rows)

Name	Order	Citations	PageRank
Pradipto Das	1	105	6.43
Yandi Xia	2	1	0.36
Aaron Levine	3	1	0.36
Giuseppe Di Fabbrizio	4	330	44.45
Anupam Datta	5	1617	87.21

1