Title
Improving The Performance Of Data Mining By Using Big Data In Cloud Environment
Abstract
The volume of business data is increasing very quickly, most of these data are relational. The need to extract knowledge with Data Mining requires keeping all historical data. This complicates more and more the processing and storage of data, and requires further power and capacity which surpass the ability of any machine. So, using distributed environments like cloud computing becomes very useful to share storage and processing between multiple nodes. Unfortunately, data based on relational model cannot be easily used in cloud because of its rigidity and elasticity in such environments. To solve this issue, new big data systems appear such as NoSQL that make data easier to share and distribute in cloud environments. So, this is theoretically beneficial for data mining use case. However, in practice we need to prove it by evaluating performance for both multi-nodes NoSQL and mono-node relational. Also, in case of cloud, it is very interesting to know if performance is still proportionally increasing according to the number of nodes, and if there is an optimum number of nodes in which performance becomes nearly steady or starts dropping off. Motivated by this topic, we propose in this paper an approach to migrate relational data to an appropriate NoSQL system in cloud environment, and then evaluate their performance to capture some interesting results for Data mining. As experimentation, we use industrial data deployed in a data mining process of an oil and gas company. After migrating these data, we perform some experiments to compare and evaluate storage, processing and execution time. As objective, we verify data elasticity, run time performance, and try to find the optimum number of nodes.
Year
DOI
Venue
2016
10.1142/S0219649216500386
JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT
Keywords
Field
DocType
Big data, data mining, NoSQL, cloud computing, relational data
Data warehouse,Data mining,Data stream mining,Relational database,Computer science,NoSQL,Data virtualization,Elasticity (data store),Big data,Database,Cloud computing
Journal
Volume
Issue
ISSN
15
4
0219-6492
Citations 
PageRank 
References 
0
0.34
0
Authors
3
Name
Order
Citations
PageRank
Djilali Dahmani100.34
Sid Ahmed Rahal200.34
Ghalem Belalem310630.12