Abstract | ||
---|---|---|
Counting the occurrence frequency of each k-mer in a biological sequence is an important step in many bioinformatics applications. However, most k-mer counting algorithms rely on a given k to produce single-length k-mers, which is inefficient for sequence analysis for different k. Moreover, existing k-mer counters focus more on DNA sequences and less on protein ones. In practice, the analysis of k-mers in protein sequences can provide substantial biological insights in structure, function and evolution. To this end, an efficient algorithm, called VLmer (Various Length k-mer mining), is proposed to mine k-mers of various lengths termed vl-mers via inverted-index technique, which is orders of magnitude faster than the conventional forward-index method. Moreover, to the best of our knowledge, VLmer is the first able to mine k-mers of various lengths in both DNA and protein sequences. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1007/978-3-319-59575-7_17 | BIOINFORMATICS RESEARCH AND APPLICATIONS (ISBRA 2017) |
Keywords | Field | DocType |
Sequential pattern mining,K-mer counting,K-mers of various lengths,Biological sequence analysis | Orders of magnitude (numbers),Computer science,Algorithm,DNA,DNA sequencing,Artificial intelligence,Sequential Pattern Mining,Machine learning,Sequence analysis | Conference |
Volume | ISSN | Citations |
10330 | 0302-9743 | 0 |
PageRank | References | Authors |
0.34 | 16 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jingsong Zhang | 1 | 38 | 3.26 |
Jianmei Guo | 2 | 390 | 22.80 |
Xiaoqing Yu | 3 | 75 | 11.53 |
Xiangtian Yu | 4 | 38 | 4.81 |
Wei-feng Guo | 5 | 8 | 2.56 |
Tao Zeng | 6 | 31 | 9.08 |
Luonan Chen | 7 | 1485 | 145.71 |