Title
Cleaning by clustering: methodology for addressing data quality issues in biomedical metadata.
Abstract
Our algorithm identified keys that were similar to each other and grouped them together. Our intuition that underpins cleaning by clustering is that, dividing keys into different clusters resolves the scalability issues for data observation and cleaning, and keys in the same cluster with duplicates and errors can easily be found. Our algorithm can also be applied to other biomedical data types.
Year
DOI
Venue
2017
10.1186/s12859-017-1832-4
BMC Bioinformatics
Keywords
Field
DocType
Biomedical,Clustering,Data quality,Experimental data,GEO,Metadata,Reusability
Metadata,Metadata repository,Data quality,Information retrieval,Computer science,Meta Data Services,Data element,Controlled vocabulary,Bioinformatics,Metadata modeling,Cluster analysis
Journal
Volume
Issue
ISSN
18
1
1471-2105
Citations 
PageRank 
References 
1
0.37
13
Authors
4
Name
Order
Citations
PageRank
Yuzhong Qu172662.49
Amrapali Zaveri236824.37
Honglei Qiu3131.37
Michel Dumontier489893.35