Title
Statistical Methods for Ambiguous Sequence Mappings
Abstract
Mapping RNA sequences to a reference genome often results in high percentages of short reads assigned to multiple locations within the genome. These mappings are known as "ambiguous mappings" and are often discarded by sequence mapping tools and pipelines. The number of ambiguous mappings within these data sets can sometimes be significantly large, occupying in certain cases as much as one third of the mapped sequences. We are developing task specific computer programs that utilize statistical methods as an alternative solution for the problem. This statistical approach is based upon identifying significantly expressed genomic locations. We handle ambiguous data through a multi-step process starting with a standard short read alignment tool to identify all the possible mappings within the genome for each sequence read. Custom programs are then used to identify expressed genomic locations by statistical methods. That is, we compare gene expression in the regions of interest with a number of randomly-selected genomic locations. Using these comparisons will help us in establishing a value at which a gene is significantly expressed and determine the locations that are most likely to be the best mapping for each ambiguous sequence.
Year
DOI
Venue
2013
10.1145/2506583.2506678
BCB
Keywords
Field
DocType
ambiguous sequence,sequence mapping tool,ambiguous data,mapped sequence,ambiguous sequence mappings,statistical methods,mapping rna sequence,reference genome,randomly-selected genomic location,statistical method,genomic location,ambiguous mapping,rna seq
Genome,Data set,Gene,RNA-Seq,Computer science,Bioinformatics,Reference genome
Conference
Citations 
PageRank 
References 
1
0.35
0
Authors
6
Name
Order
Citations
PageRank
Tamer Aldwairi160.77
Bindu Nanduri21088.72
Mahalingam Ramkumar314123.99
Dilip Gautam410.35
Michael T. Johnson543553.51
Andy Perkins641.26