Title
A unified STR profiling system across multiple species with whole genome sequencing data.
Abstract
Short tandem repeats (STRs) serve as genetic markers in forensic scenes due to their high polymorphism in eukaryotic genomes. A variety of STRs profiling systems have been developed for species including human, dog, cat, cattle, etc. Maintaining these systems simultaneously can be costly. These mammals share many high similar regions along their genomes. With the availability of the massive amount of the whole genomics data of these species, it is possible to develop a unified STR profiling system. In this study, our objective is to propose and develop a unified set of STR loci that could be simultaneously applied to multiple species. To find a unified STR set, we collected the whole genome sequence data of the concerned species and mapped them to the human genome reference. Then we extracted the STR loci across the species. From these loci, we proposed an algorithm which selected a subset of loci by incorporating the optimized combined power of discrimination. Our results show that the unified set of loci have high combined power of discrimination, >1−10−9, for both individual species and the mixed population, as well as the random-match probability, <10−7 for all the involved species, indicating that the identified set of STR loci could be applied to multiple species. We identified a set of STR loci which shared by multiple species. It implies that a unified STR profiling system is possible for these species under the forensic scenes. The system can be applied to the individual identification or paternal test of each of the ten common species which are Sus scrofa (pig), Bos taurus (cattle), Capra hircus (goat), Equus caballus (horse), Canis lupus familiaris (dog), Felis catus (cat), Ovis aries (sheep), Oryctolagus cuniculus (rabbit), and Bos grunniens (yak), and Homo sapiens (human). Our loci selection algorithm employed a greedy approach. The algorithm can generate the loci under different forensic parameters and for a specific combination of species.
Year
DOI
Venue
2019
10.1186/s12859-019-3246-y
BMC Bioinformatics
Keywords
Field
DocType
Short tandem repeats, Whole genome sequencing, Individual identification
Genome,Population,Biology,Microsatellite,Genomics,Whole genome sequencing,Genetics,Locus (genetics),Evolutionary biology,Common species,Genetic marker
Journal
Volume
Issue
ISSN
24
suppl
1471-2105
Citations 
PageRank 
References 
0
0.34
0
Authors
5
Name
Order
Citations
PageRank
Yilin Liu197.57
Jiao Xu200.34
Miaoxia Chen300.34
Changfa Wang400.68
Shuai Cheng Li518430.25