Title
On the Use of Data Mining Tools for Data Preparation in Classification Problems
Abstract
The data preparation phase is a critical step in the KDD (Knowledge Discovery in Databases) process. This phase is crucial for a good data mining result because if data is not correctly prepared, all the next phases of the process are compromised. DMPML is a framework that stores preprocessed data for different data mining algorithms in an XML document and retrieves the correct codification by the use of an XSLT document according to the needs of the data mining algorithm. This paper presents a comparison between DMPML and three data mining applications (Weka, Rapid Miner, and KNIME) that implement the directed graph approach, concerning the time spent to create and execute the data preparation tasks for two data mining algorithms. The tests were executed using different types of data sets: numerical, categorical, and mixed. We observed that the scheme used by DMPML can simplify the usage of different data mining algorithms and significantly reduce the time spent creating the data preparation tasks.
Year
DOI
Venue
2012
10.1109/ICIS.2012.79
ACIS-ICIS
Keywords
Field
DocType
data preparation,stores preprocessed data,data mining tools,xml document,data preparation task,data preparation phase,classification problems,different type,data mining application,good data mining result,different data mining algorithm,data mining algorithm,time measurement,data mining,directed graphs,testing,directed graph,xml
Data mining,Concept mining,Data stream mining,XML,Data mapping,Computer science,Data pre-processing,Data type,Knowledge extraction,XSLT
Conference
Citations 
PageRank 
References 
1
0.36
0
Authors
3
Name
Order
Citations
PageRank
Paulo Mauricio Goncalves1323.33
Roberto S. M. Barros2728.68
Davi C. L. Vieira320.71