Title
Mining K-mers of Various Lengths in Biological Sequences.
Abstract
Counting the occurrence frequency of each k-mer in a biological sequence is an important step in many bioinformatics applications. However, most k-mer counting algorithms rely on a given k to produce single-length k-mers, which is inefficient for sequence analysis for different k. Moreover, existing k-mer counters focus more on DNA sequences and less on protein ones. In practice, the analysis of k-mers in protein sequences can provide substantial biological insights in structure, function and evolution. To this end, an efficient algorithm, called VLmer (Various Length k-mer mining), is proposed to mine k-mers of various lengths termed vl-mers via inverted-index technique, which is orders of magnitude faster than the conventional forward-index method. Moreover, to the best of our knowledge, VLmer is the first able to mine k-mers of various lengths in both DNA and protein sequences.
Year
DOI
Venue
2017
10.1007/978-3-319-59575-7_17
BIOINFORMATICS RESEARCH AND APPLICATIONS (ISBRA 2017)
Keywords
Field
DocType
Sequential pattern mining,K-mer counting,K-mers of various lengths,Biological sequence analysis
Orders of magnitude (numbers),Computer science,Algorithm,DNA,DNA sequencing,Artificial intelligence,Sequential Pattern Mining,Machine learning,Sequence analysis
Conference
Volume
ISSN
Citations 
10330
0302-9743
0
PageRank 
References 
Authors
0.34
16
7
Name
Order
Citations
PageRank
Jingsong Zhang1383.26
Jianmei Guo239022.80
Xiaoqing Yu37511.53
Xiangtian Yu4384.81
Wei-feng Guo582.56
Tao Zeng6319.08
Luonan Chen71485145.71