An Efficient Data Extracting Method Based on Hadoop. - Citegraph

Paper Info

Title
An Efficient Data Extracting Method Based on Hadoop.

Abstract
As an open-source big data solutions, Hadoop ecosystem have been widely accepted and applied. However, how to import large amounts of data in a short time from the traditional relational database to hadoop become a major challenge for ETL (Extract-Transform-Load) stage of big data processing. This paper presents an efficient parallel data extraction method based on hadoop, using MapReduce computation engine to call JDBC(The Java Database Connectivity) interface for data extraction. Among them, for the problem of multi-Map segmentation during the data input, this paper presents a dynamic segmentation algorithm for Map input based on range partition, can effectively avoid data tilt, making the input data is distributed more uniform in each Map. Experimental results show that the proposed method with respect to the ETL tool Sqoop which also using the same calculation engine of MapReduce is more uniform in dividing the input data and take less time when extract same datas.

Year	DOI	Venue
2014	10.1007/978-3-319-16050-4_8	Lecture Notes of the Institute for Computer Sciences Social Informatics and Telecommunications Engineering
Keywords	Field	DocType
ETL,Hadoop,MapReduce,Big data,Range Partition	Big data processing,Data mining,Division (mathematics),Relational database,Computer science,Segmentation,Data extraction,Big data,Java,Distributed computing,Computation	Conference
Volume	ISSN	Citations
142	1867-8211	1
PageRank	References	Authors
0.35	5	5

Authors (5 rows)

Cited by (1 rows)

References (5 rows)

Name	Order	Citations	PageRank
Lianchao Cao	1	1	0.35
Zhanqiang Li	2	1	0.35
Kaiyuan Qi	3	1	0.69
Guomao Xin	4	1	0.35
Dong Zhang	5	1	1.03

1