Abstract | ||
---|---|---|
Metagenomics involves the analysis of genomes of microorganisms sampled directly from their environment. Next Generation Sequencing (NGS) technologies allow a high-throughput sampling of small segments from genomes in the metagenome to generate a large number of reads. In order to study the properties and relationships of the microorganisms present, clustering of the sampled reads into groups of similar species is important. Clustering can be performed either by mapping the sampled reads to known sequencing databases, though this hinders the discovery of new species; or based on the inherent composition of the sampled reads. We propose a two-dimensional lattice based probabilistic model for clustering metagenomic datasets. The probability of a species in the metagenome is defined as a lattice model of probabilistic distributions over short sized genomic sequences (or words). The two dimensions denote distributions for different sizes and groups of words respectively. The lattice structure allows for additional support for a node from its neighbors when the probabilistic support for the species in the current node is deemed insufficient. Unlike other popular clustering algorithms such as Scimm, our algorithm guarantees convergence. We test our algorithm on simulated metagenomic data containing bacterial species and observe more than 85% precision. We also evaluate our algorithm on an in vitro-simulated bacterial metagenome and show a better clustering even for short reads and varied abundance. The software and datasets can be downloaded from https://github.com/lattcl us/lattice-metage. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1145/2808719.2814843 | BCB |
DocType | Citations | PageRank |
Conference | 1 | 0.36 |
References | Authors | |
11 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Manjari Mukhopadhyay | 1 | 1 | 0.36 |
Raunaq Malhotra | 2 | 5 | 3.17 |
Raj Acharya | 3 | 347 | 55.42 |