Title | ||
---|---|---|
Cleaning by clustering: methodology for addressing data quality issues in biomedical metadata. |
Abstract | ||
---|---|---|
Our algorithm identified keys that were similar to each other and grouped them together. Our intuition that underpins cleaning by clustering is that, dividing keys into different clusters resolves the scalability issues for data observation and cleaning, and keys in the same cluster with duplicates and errors can easily be found. Our algorithm can also be applied to other biomedical data types. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1186/s12859-017-1832-4 | BMC Bioinformatics |
Keywords | Field | DocType |
Biomedical,Clustering,Data quality,Experimental data,GEO,Metadata,Reusability | Metadata,Metadata repository,Data quality,Information retrieval,Computer science,Meta Data Services,Data element,Controlled vocabulary,Bioinformatics,Metadata modeling,Cluster analysis | Journal |
Volume | Issue | ISSN |
18 | 1 | 1471-2105 |
Citations | PageRank | References |
1 | 0.37 | 13 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yuzhong Qu | 1 | 726 | 62.49 |
Amrapali Zaveri | 2 | 368 | 24.37 |
Honglei Qiu | 3 | 13 | 1.37 |
Michel Dumontier | 4 | 898 | 93.35 |