Title
A study on the importance of and time spent on different modeling steps
Abstract
Applying data mining and machine learning algorithms requires many steps to prepare data and to make use of modeling results. This study investigates two questions: (1) how time consuming are the pre- and post-processing steps? (2) how much research energy is spent on these steps? To answer these questions I surveyed practitioners about their experiences in applying modeling techniques and categorized data mining and machine learning research papers from 2009 according to the modeling step(s) they addressed. Survey results show that model building consumes only 14% of the time spent on a typical project; the remaining time is spent on pre- and post-processing steps. Both survey responses and the categorization of research papers show that data mining and machine learning researchers spend the majority of their energy on algorithms for constructing models and significantly less energy on other steps. These findings collectively suggest that there are research opportunities to simplify the steps that precede and follow model building.
Year
DOI
Venue
2011
10.1145/2207243.2207253
SIGKDD Explorations
Keywords
DocType
Volume
data mining,modeling step,research energy,post-processing step,different modeling step,model building,research paper,remaining time,modeling technique,research opportunity,time consuming,machine learning,categorical data
Journal
13
Issue
Citations 
PageRank 
2
6
0.51
References 
Authors
3
1
Name
Order
Citations
PageRank
M. Arthur Munson1281.81