Abstract | ||
---|---|---|
Machine learning clustering algorithms provide excellent methods for conducting metagenomic analysis with efficiency. This study uses two machine learning algorithms, the self-organizing map and the K-means algorithms, to cluster data from an environmental sample collected from a hot springs habitat and to provide a visual analysis of that data. A data processing pipeline is described that uses the clustering algorithms to identify which reference genomes should be included for further analysis in determining possible organisms that are present in a metagenomic sample. The clustering revealed probable candidates for additional analysis, including a thermophilic, anaerobic bacterium, which is likely to be found in a hot springs environment and serves to validate the functionality of these tools. The machine learning techniques discussed here can serve as a launching point for elucidating protein sequences that could serve as possible reference comparisons to a specific metagenomic sample and lead to further study. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1080/10798587.2015.1073887 | INTELLIGENT AUTOMATION AND SOFT COMPUTING |
Keywords | Field | DocType |
Metagenomics, Clustering, K-means, Machine learning, Self-organizing map | Data mining,k-means clustering,Data processing,Computer science,Self-organizing map,Metagenomics,Artificial intelligence,Anaerobic bacterium,Cluster analysis,Machine learning | Journal |
Volume | Issue | ISSN |
22 | 1 | 1079-8587 |
Citations | PageRank | References |
0 | 0.34 | 10 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Damien Ennis | 1 | 0 | 0.34 |
Sergiu Dascalu | 2 | 362 | 79.10 |
Frederick C. Harris Jr. | 3 | 547 | 78.86 |