Title
A generalized lattice model for clustering metagenomic sequences
Abstract
Metagenomics involves the analysis of genomes of microorganisms sampled directly from their environment. Next Generation Sequencing (NGS) technologies allow a high-throughput sampling of small segments from genomes in the metagenome to generate a large number of reads. In order to study the properties and relationships of the microorganisms present, clustering of the sampled reads into groups of similar species is important. Clustering can be performed either by mapping the sampled reads to known sequencing databases, though this hinders the discovery of new species; or based on the inherent composition of the sampled reads. We propose a two-dimensional lattice based probabilistic model for clustering metagenomic datasets. The probability of a species in the metagenome is defined as a lattice model of probabilistic distributions over short sized genomic sequences (or words). The two dimensions denote distributions for different sizes and groups of words respectively. The lattice structure allows for additional support for a node from its neighbors when the probabilistic support for the species in the current node is deemed insufficient. Unlike other popular clustering algorithms such as Scimm, our algorithm guarantees convergence. We test our algorithm on simulated metagenomic data containing bacterial species and observe more than 85% precision. We also evaluate our algorithm on an in vitro-simulated bacterial metagenome and show a better clustering even for short reads and varied abundance. The software and datasets can be downloaded from https://github.com/lattcl us/lattice-metage.
Year
DOI
Venue
2015
10.1145/2808719.2814843
BCB
DocType
Citations 
PageRank 
Conference
1
0.36
References 
Authors
11
3
Name
Order
Citations
PageRank
Manjari Mukhopadhyay110.36
Raunaq Malhotra253.17
Raj Acharya334755.42