Title
Assemble CRISPRs from metagenomic sequencing data.
Abstract
Motivation: Clustered regularly interspaced short palindromic repeats and associated proteins (CRISPR-Cas) allows more specific and efficient gene editing than all previous genetic engineering systems. These exciting discoveries stem from the finding of the CRISPR system being an adaptive immune system that protects the prokaryotes against exogenous genetic elements such as phages. Despite the exciting discoveries, almost all knowledge about CRISPRs is based only on microorganisms that can be isolated, cultured and sequenced in labs. However, about 95% of bacterial species cannot be cultured in labs. The fast accumulation of metagenomic data, which contains DNA sequences of microbial species from natural samples, provides a unique opportunity for CRISPR annotation in uncultivable microbial species. However, the large amount of data, heterogeneous coverage and shared leader sequences of some CRISPRs pose challenges for identifying CRISPRs efficiently in metagenomic data. Results: In this study, we developed a CRISPR finding tool for metagenomic data without relying on generic assembly, which is error-prone and computationally expensive for complex data. Our tool can run on commonly available machines in small labs. It employs properties of CRISPRs to decompose generic assembly into local assembly. We tested it on both mock and real metagenomic data and benchmarked the performance with state-of-the-art tools.
Year
DOI
Venue
2016
10.1093/bioinformatics/btw456
BIOINFORMATICS
Field
DocType
Volume
Data mining,Computer science,Metagenomics,Computational biology,CRISPR
Journal
32
Issue
ISSN
Citations 
17
1367-4803
0
PageRank 
References 
Authors
0.34
6
2
Name
Order
Citations
PageRank
Jikai Lei1263.64
Yanni Sun221921.16