Abstract | ||
---|---|---|
Dimension reduction (DR) methods play an inevitable role in analyzing and visualizing high-dimensional multi-source data. In the recent decades many variants of these methods have been developed in various disciplines and domains. Due to the diversity and an ever-increasing number of developed techniques, choosing the right method for the given problem is a difficult task. In this study we benchmark 87 methods for integrative dimension reduction of mRNA expression and DNA methylation data, which is a common problem in biology and medicine. Our ranking is obtained based on four main factors: quality of dimension reduction (local, global, and local-global neighborhood preservation), clustering quality, speed and sensitivity to input parameters on multiple datasets generated by InterSIM (a semi-realistic multi-source data simulator in the genomics domain). The results are later validated on a real dataset for breast cancer through visual evaluation metrics such as co-ranking matrices, inspection of true cancer sub-types in two-dimensional projections, and LCMC curves. We also demonstrate the relationship between the methods via network inference. The findings in this study can be useful in algorithm selection and planning of experimental design in multi-source data analysis. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1016/j.ins.2019.04.041 | Information Sciences |
Keywords | Field | DocType |
Performance evaluation,Dimension reduction,Multi-source data,Data fusion,Matrix factorization | Dimensionality reduction,Ranking,Inference,Genomics,Artificial intelligence,Algorithm Selection,Cluster analysis,Mathematics,Machine learning | Journal |
Volume | ISSN | Citations |
493 | 0020-0255 | 2 |
PageRank | References | Authors |
0.46 | 0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hadi Fanaee-T | 1 | 75 | 8.55 |
Magne Thoresen | 2 | 12 | 4.14 |