Abstract | ||
---|---|---|
Several areas, such as science, economics, finance, business intelligence, health, and others are exploring big data as a way to produce new information, make better decisions, and move forward their related technologies and systems. Specifically in health, big data represents a challenging problem due to the poor quality of data in some circumstances and the need to retrieve, aggregate, and process a huge amount of data from disparate databases. In this work, we focused on Brazilian Public Health System and on large databases from Ministry of Health and Ministry of Social Development and Hunger Alleviation. We present our Spark-based approach to data processing and probabilistic record linkage of such databases in order to produce very accurate data marts. These data marts are used by statisticians and epidemiologists to assess the effectiveness of conditional cash transfer programs to poor families in respect with the occurrence of some diseases (tuberculosis, leprosy, and AIDS). The case study we made as a proof-of-concept presents a good performance with accurate results. For comparison, we also discuss an OpenMP-based implementation. |
Year | Venue | Field |
---|---|---|
2015 | EDBT/ICDT Workshops | Health care,Data science,Record linkage,Spark (mathematics),Information retrieval,Computer science,Probabilistic logic,Business intelligence,Big data,Workflow,Conditional cash transfer |
DocType | Citations | PageRank |
Conference | 4 | 0.47 |
References | Authors | |
7 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Robespierre Pita | 1 | 5 | 2.54 |
Clicia Pinto | 2 | 5 | 2.20 |
Pedro Melo | 3 | 5 | 0.85 |
Malu Silva | 4 | 4 | 0.47 |
Marcos E. Barreto | 5 | 118 | 13.10 |
Davide Rasella | 6 | 4 | 0.47 |