Title
An integrated development environment for faster feature engineering
Abstract
The application of machine learning to large datasets has become a core component of many important and exciting software systems being built today. The extreme value in these trained systems is tempered, however, by the difficulty of constructing them. As shown by the experience of Google, Netflix, IBM, and many others, a critical problem in building trained systems is that of feature engineering. High-quality machine learning features are crucial for the system's performance but are difficult and time-consuming for engineers to develop. Data-centric developer tools that improve the productivity of feature engineers will thus likely have a large impact on an important area of work. We have built a demonstration integrated development environment for feature engineers. It accelerates one particular step in the feature engineering development cycle: evaluating the effectiveness of novel feature code. In particular, it uses an index and runtime execution planner to process raw data objects (e.g., Web pages) in order of descending likelihood that the data object will be relevant to the user's feature code. This demonstration IDE allows the user to write arbitrary feature code, evaluate its impact on learner quality, and observe exactly how much faster our technique performs compared to a baseline system.
Year
DOI
Venue
2014
10.14778/2733004.2733054
PVLDB
Field
DocType
Volume
Data mining,IBM,Web page,Computer science,Raw data,Planner,Software system,Feature model,Feature engineering,Artificial intelligence,Development environment,Database,Machine learning
Journal
7
Issue
ISSN
Citations 
13
2150-8097
5
PageRank 
References 
Authors
0.40
7
5
Name
Order
Citations
PageRank
Michael Anderson112519.21
Michael J. Cafarella22246144.15
Yixing Jiang350.73
Guan Wang450.40
Bochun Zhang591.05