Title
AutoCross: Automatic Feature Crossing for Tabular Data in Real-World Applications
Abstract
Feature crossing captures interactions among categorical features and is useful to enhance learning from tabular data in real-world businesses. In this paper, we present AutoCross, an automatic feature crossing tool provided by 4Paradigm to its customers, ranging from banks, hospitals, to Internet corporations. By performing beam search in a tree-structured space, AutoCross enables efficient generation of high-order cross features, which is not yet visited by existing works. Additionally, we propose successive mini-batch gradient descent and multi-granularity discretization to further improve efficiency and effectiveness, while ensuring simplicity so that no machine learning expertise or tedious hyper-parameter tuning is required. Furthermore, the algorithms are designed to reduce the computational, transmitting, and storage costs involved in distributed computing. Experimental results on both benchmark and real-world business datasets demonstrate the effectiveness and efficiency of AutoCross. It is shown that AutoCross can significantly enhance the performance of both linear and deep models.
Year
DOI
Venue
2019
10.1145/3292500.3330679
Knowledge Discovery and Data Mining
Keywords
Field
DocType
AutoML,Feature Crossing,Tabular Data
Data mining,Discretization,Gradient descent,Categorical variable,Beam search,Ranging,Artificial intelligence,Mathematics,Machine learning,The Internet
Journal
Citations 
PageRank 
References 
9
0.58
0
Authors
8
Name
Order
Citations
PageRank
Luo Yuanfei1100.93
Mengshuo Wang2212.23
Zhou Hao3151.10
Quanming Yao428827.13
Wei-Wei Tu5337.86
Yuqiang Chen6673.23
Qiang Yang717039875.69
Wenyuan Dai8114249.14