A Spark-based Workflow for Probabilistic Record Linkage of Healthcare Data. - Citegraph

Paper Info

Title
A Spark-based Workflow for Probabilistic Record Linkage of Healthcare Data.

Abstract
Several areas, such as science, economics, finance, business intelligence, health, and others are exploring big data as a way to produce new information, make better decisions, and move forward their related technologies and systems. Specifically in health, big data represents a challenging problem due to the poor quality of data in some circumstances and the need to retrieve, aggregate, and process a huge amount of data from disparate databases. In this work, we focused on Brazilian Public Health System and on large databases from Ministry of Health and Ministry of Social Development and Hunger Alleviation. We present our Spark-based approach to data processing and probabilistic record linkage of such databases in order to produce very accurate data marts. These data marts are used by statisticians and epidemiologists to assess the effectiveness of conditional cash transfer programs to poor families in respect with the occurrence of some diseases (tuberculosis, leprosy, and AIDS). The case study we made as a proof-of-concept presents a good performance with accurate results. For comparison, we also discuss an OpenMP-based implementation.

Year	Venue	Field
2015	EDBT/ICDT Workshops	Health care,Data science,Record linkage,Spark (mathematics),Information retrieval,Computer science,Probabilistic logic,Business intelligence,Big data,Workflow,Conditional cash transfer
DocType	Citations	PageRank
Conference	4	0.47
References	Authors
7	6

Authors (6 rows)

Cited by (4 rows)

References (7 rows)

Name	Order	Citations	PageRank
Robespierre Pita	1	5	2.54
Clicia Pinto	2	5	2.20
Pedro Melo	3	5	0.85
Malu Silva	4	4	0.47
Marcos E. Barreto	5	118	13.10
Davide Rasella	6	4	0.47

1