Title | ||
---|---|---|
MOLGENIS/connect: a system for semi-automatic integration of heterogeneous phenotype data with applications in biobanks. |
Abstract | ||
---|---|---|
Motivation: While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration. Results: To address this challenge, we developed MOLGENIS/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontology-based query expansion to overcome variations in terminology. Then it generates algorithms that transform source attributes to a common target DataSchema. These include unit conversion, categorical value matching and complex conversion patterns (e.g. calculation of BMI). In comparison to human-experts, MOLGENIS/connect was able to auto-generate 27% of the algorithms perfectly, with an additional 46% needing only minor editing, representing a reduction in the human effort and expertise needed to pool data. Availability and Implementation: Source code, binaries and documentation are available as open-source under LGPLv3 from http://github.com/molgenis/molgenis and www.molgenis.org/connect. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1093/bioinformatics/btw155 | BIOINFORMATICS |
Field | DocType | Volume |
Data mining,Ontology,Conversion of units,Terminology,Query expansion,Computer science,Categorical variable,Source code,Software,Bioinformatics,Documentation | Journal | 32 |
Issue | ISSN | Citations |
14 | 1367-4803 | 3 |
PageRank | References | Authors |
0.48 | 6 | 14 |
Name | Order | Citations | PageRank |
---|---|---|---|
Chao Pang | 1 | 143 | 19.04 |
David van Enckevort | 2 | 5 | 2.90 |
Mark de Haan | 3 | 9 | 2.06 |
Fleur Kelpin | 4 | 9 | 1.72 |
Jonathan Jetten | 5 | 9 | 1.39 |
Dennis Hendriksen | 6 | 16 | 2.41 |
Tommy de Boer | 7 | 9 | 1.72 |
Bart Charbon | 8 | 9 | 1.72 |
Erwin Winder | 9 | 3 | 0.48 |
K. Joeri van der Velde | 10 | 79 | 6.58 |
Dany Doiron | 11 | 3 | 0.48 |
Isabel Fortier | 12 | 3 | 0.48 |
Hans L. Hillege | 13 | 13 | 2.24 |
Morris A Swertz | 14 | 155 | 18.03 |