Title
A Spark-based Workflow for Probabilistic Record Linkage of Healthcare Data.
Abstract
Several areas, such as science, economics, finance, business intelligence, health, and others are exploring big data as a way to produce new information, make better decisions, and move forward their related technologies and systems. Specifically in health, big data represents a challenging problem due to the poor quality of data in some circumstances and the need to retrieve, aggregate, and process a huge amount of data from disparate databases. In this work, we focused on Brazilian Public Health System and on large databases from Ministry of Health and Ministry of Social Development and Hunger Alleviation. We present our Spark-based approach to data processing and probabilistic record linkage of such databases in order to produce very accurate data marts. These data marts are used by statisticians and epidemiologists to assess the effectiveness of conditional cash transfer programs to poor families in respect with the occurrence of some diseases (tuberculosis, leprosy, and AIDS). The case study we made as a proof-of-concept presents a good performance with accurate results. For comparison, we also discuss an OpenMP-based implementation.
Year
Venue
Field
2015
EDBT/ICDT Workshops
Health care,Data science,Record linkage,Spark (mathematics),Information retrieval,Computer science,Probabilistic logic,Business intelligence,Big data,Workflow,Conditional cash transfer
DocType
Citations 
PageRank 
Conference
4
0.47
References 
Authors
7
6
Name
Order
Citations
PageRank
Robespierre Pita152.54
Clicia Pinto252.20
Pedro Melo350.85
Malu Silva440.47
Marcos E. Barreto511813.10
Davide Rasella640.47