Abstract | ||
---|---|---|
Objective: When studying any specific rare disease, heterogeneity and scarcity of affected individuals has historically hindered investigators from discerning on what to focus to understand and diagnose a disease. New nongenomic methodologies must be developed that identify similarities in seemingly dissimilar conditions. Materials and Methods: This observational study analyzes 1042 patients from the Undiagnosed Diseases Network (2015-2019), a multicenter, nationwide research study using phenotypic data annotated by specialized staff using Human Phenotype Ontology terms. We used Louvain community detection to cluster patients linked by Jaccard pairwise similarity and 2 support vector classifier to assign new cases. We further validated the clusters' most representative comorbidities using a national claims database (67 million patients). Results: Patients were divided into 2 groups: those with symptom onset before 18 years of age (n = 810) and at 18 years of age or older (n = 232) (average symptom onset age: 10 [interquartile range, 0-14] years). For 810 pediatric patients, we identified 4 statistically significant clusters. Two clusters were characterized by growth disorders, and developmental delay enriched for hypotonia presented a higher likelihood of diagnosis. Support vector classifier showed 0.89 balanced accuracy (0.83 for Human Phenotype Ontology terms only) on test data. Discussions: To set the framework for future discovery, we chose as our endpoint the successful grouping of patients by phenotypic similarity and provide a classification tool to assign new patients to those clusters. Conclusion: This study shows that despite the scarcity and heterogeneity of patients, we can still find commonalities that can potentially be harnessed to uncover new insights and targets for therapy. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1093/jamia/ocab050 | JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION |
Keywords | DocType | Volume |
rare diseases, undiagnosed diseases, cluster analysis, supervised machine learning, unsupervised machine learning | Journal | 28 |
Issue | ISSN | Citations |
8 | 1067-5027 | 0 |
PageRank | References | Authors |
0.34 | 0 | 12 |
Name | Order | Citations | PageRank |
---|---|---|---|
Josephine Yates | 1 | 0 | 0.34 |
Alba Gutiérrez-Sacristán | 2 | 0 | 0.34 |
Vianney Jouhet | 3 | 9 | 4.30 |
Kimberly LeBlanc | 4 | 0 | 0.34 |
Cecilia Esteves | 5 | 0 | 0.34 |
Thomas N DeSain | 6 | 0 | 0.68 |
Nick Benik | 7 | 0 | 0.34 |
Jason Stedman | 8 | 1 | 0.69 |
Nathan Palmer | 9 | 0 | 0.68 |
Guillaume Mellon | 10 | 0 | 0.34 |
Isaac Kohane | 11 | 0 | 1.01 |
Paul Avillach | 12 | 62 | 8.56 |