MOLGENIS/connect: a system for semi-automatic integration of heterogeneous phenotype data with applications in biobanks. - Citegraph

Paper Info

Title
MOLGENIS/connect: a system for semi-automatic integration of heterogeneous phenotype data with applications in biobanks.

Abstract
Motivation: While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration. Results: To address this challenge, we developed MOLGENIS/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontology-based query expansion to overcome variations in terminology. Then it generates algorithms that transform source attributes to a common target DataSchema. These include unit conversion, categorical value matching and complex conversion patterns (e.g. calculation of BMI). In comparison to human-experts, MOLGENIS/connect was able to auto-generate 27% of the algorithms perfectly, with an additional 46% needing only minor editing, representing a reduction in the human effort and expertise needed to pool data. Availability and Implementation: Source code, binaries and documentation are available as open-source under LGPLv3 from http://github.com/molgenis/molgenis and www.molgenis.org/connect.

Year	DOI	Venue
2016	10.1093/bioinformatics/btw155	BIOINFORMATICS
Field	DocType	Volume
Data mining,Ontology,Conversion of units,Terminology,Query expansion,Computer science,Categorical variable,Source code,Software,Bioinformatics,Documentation	Journal	32
Issue	ISSN	Citations
14	1367-4803	3
PageRank	References	Authors
0.48	6	14

Authors (14 rows)

Cited by (3 rows)

References (6 rows)

Name	Order	Citations	PageRank
Chao Pang	1	143	19.04
David van Enckevort	2	5	2.90
Mark de Haan	3	9	2.06
Fleur Kelpin	4	9	1.72
Jonathan Jetten	5	9	1.39
Dennis Hendriksen	6	16	2.41
Tommy de Boer	7	9	1.72
Bart Charbon	8	9	1.72
Erwin Winder	9	3	0.48
K. Joeri van der Velde	10	79	6.58
Dany Doiron	11	3	0.48
Isabel Fortier	12	3	0.48
Hans L. Hillege	13	13	2.24
Morris A Swertz	14	155	18.03

1