Title
Rapid storage and retrieval of genomic intervals from a relational database system using nested containment lists.
Abstract
Efficient storage and retrieval of genomic annotations based on range intervals is necessary, given the amount of data produced by next-generation sequencing studies. The indexing strategies of relational database systems (such as MySQL) greatly inhibit their use in genomic annotation tasks. This has led to the development of stand-alone applications that are dependent on flat-file libraries. In this work, we introduce MyNCList, an implementation of the NCList data structure within a MySQL database. MyNCList enables the storage, update and rapid retrieval of genomic annotations from the convenience of a relational database system. Range-based annotations of 1 million variants are retrieved in under a minute, making this approach feasible for whole-genome annotation tasks.
Year
DOI
Venue
2013
10.1093/database/bat056
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION
Keywords
Field
DocType
genomics,database management systems,algorithms,search engine
Data mining,Data structure,Annotation,Database model,Information retrieval,Computer science,Search engine indexing,Database schema,Database design,Relational database management system,Database theory,Database
Journal
Volume
ISSN
Citations 
2013
1758-0463
1
PageRank 
References 
Authors
0.37
3
3
Name
Order
Citations
PageRank
Laura K. Wiley110.37
R. Michael Sivley211.05
William S. Bush316118.45