Title
Assembly independent functional annotation of short-read data using SOFA: Short-ORF functional annotation
Abstract
Accurate description of the microbial communities driving matter and energy transformations in complex ecosystems such as soils cannot yet be effectively accomplished using assembly-based approaches despite the rise of next generation sequencing technologies. Here we present SOFA, an open source pipeline enabling comparative functional annotation of unassembled short-read data. The pipeline attempts to merge mate pairs in fastq files, predicts open reading frames (ORFs) on merged and unmerged reads as small as 70 bps, and completes an additional step, we term `deduplication'. Deduplication prevents the double counting of ORFs predicted from unmerged paired-end reads by checking for homologous annotations that span the same ORF, allowing for quantitatively accurate predictions. The effectiveness of SOFA is validated with both simulated and bone fide soil metagenomes, and empirical results are compared to existing strategies for obtaining accurate ORF counts, and an analytical model of read duplication. SOFA enables downstream processing stages within the existing MetaPathways pipeline, and is available for download as a stand alone application at https://github.com under the MIT license.
Year
DOI
Venue
2015
10.1109/CIBCB.2015.7300324
2015 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)
Keywords
Field
DocType
assembly independent functional annotation,short read data,SOFA,Short-ORF functional annotation,energy transformation,complex ecosystems,soils,open source pipeline,deduplication,MetaPathways pipeline
Data deduplication,Annotation,Computer science,FASTQ format,MIT License,Bioinformatics,ORFS,Merge (version control)
Conference
Citations 
PageRank 
References 
1
0.37
12
Authors
5
Name
Order
Citations
PageRank
Aria S. Hahn141.80
Niels W. Hanson2293.58
Dongjae Kim341.46
Kishori M. Konwar410717.49
Steven J. Hallam5343.97