Title
SolidBin: Improving Metagenome Binning with Semi-supervised Normalized Cut.
Abstract
Motivation: Metagenomic contig binning is an important computational problem in metagenomic research, which aims to cluster contigs from the same genome into the same group. Unlike classical clustering problem, contig binning can utilize known relationships among some of the contigs or the taxonomic identity of some contigs. However, the current state-of-the-art contig binning methods do not make full use of the additional biological information except the coverage and sequence composition of the contigs. Results: We developed a novel contig binning method, Semi-supervised Spectral Normalized Cut for Binning (SolidBin), based on semi-supervised spectral clustering. Using sequence feature similarity and/or additional biological information, such as the reliable taxonomy assignments of some contigs, SolidBin constructs two types of prior information: must-link and cannot-link constraints. Must-link constraints mean that the pair of contigs should be clustered into the same group, while cannot-link constraints mean that the pair of contigs should be clustered in different groups. These constraints are then integrated into a classical spectral clustering approach, normalized cut, for improved contig binning. The performance of SolidBin is compared with five state-of-the-art genome binners, CONCOCT, COCACOLA, MaxBin, MetaBAT and BMC3C on five next-generation sequencing benchmark datasets including simulated multi- and single-sample datasets and real multi-sample datasets. The experimental results show that, SolidBin has achieved the best performance in terms of F-score, Adjusted Rand Index and Normalized Mutual Information, especially while using the real datasets and the single-sample dataset.
Year
DOI
Venue
2019
10.1093/bioinformatics/btz253
BIOINFORMATICS
Field
DocType
Volume
Data mining,Normalization (statistics),Computer science,Metagenomics
Journal
35
Issue
ISSN
Citations 
21
1367-4803
2
PageRank 
References 
Authors
0.37
0
5
Name
Order
Citations
PageRank
Ziye Wang131.41
Zhengyang Wang220.37
Yang Young Lu3132.62
Fengzhu Sun4963107.14
Shanfeng Zhu542935.04