Input selection for fast feature engineering - Citegraph

Paper Info

Title
Input selection for fast feature engineering

Abstract
The application of machine learning to large datasets has become a vital component of many important and sophisticated software systems built today. Such trained systems are often based on supervised learning tasks that require features, signals extracted from the data that distill complicated raw data objects into a small number of salient values. A trained system's success depends substantially on the quality of its features. Unfortunately, feature engineering-the process of writing code that takes raw data objects as input and outputs feature vectors suitable for a machine learning algorithm-is a tedious, time-consuming experience. Because “big data” inputs are so diverse, feature engineering is often a trial-and-error process requiring many small, iterative code changes. Because the inputs are so large, each code change can involve a time-consuming data processing task (over each page in a Web crawl, for example). We introduce Zombie, a data-centric system that accelerates feature engineering through intelligent input selection, optimizing the “inner loop” of the feature engineering process. Our system yields feature evaluation speedups of up to 8× in some cases and reduces engineer wait times from 8 to 5 hours in others.

Year	DOI	Venue
2016	10.1109/ICDE.2016.7498272	2016 IEEE 32nd International Conference on Data Engineering (ICDE)
Keywords	Field	DocType
input selection,fast feature engineering,machine learning,supervised learning tasks,Big Data,data processing task,Zombie data-centric system	Data mining,Semi-supervised learning,Computer science,Feature model,Feature engineering,Artificial intelligence,Feature vector,Feature (computer vision),Feature extraction,Supervised learning,Machine learning,Database,Feature learning	Conference
ISSN	Citations	PageRank
1084-4627	7	0.45
References	Authors
31	2

Authors (2 rows)

Cited by (7 rows)

References (31 rows)

Name	Order	Citations	PageRank
Michael Anderson	1	125	19.21
Michael J. Cafarella	2	2246	144.15

1