Title
A perceptual hash function to store and retrieve large scale DNA sequences.
Abstract
This paper proposes a novel approach for storing and retrieving massive DNA sequences.. The method is based on a perceptual hash function, commonly used to determine the similarity between digital images, that we adapted for DNA sequences. Perceptual hash function presented here is based on a Discrete Cosine Transform Sign Only (DCT-SO). Each nucleotide is encoded as a fixed gray level intensity pixel and the hash is calculated from its significant frequency characteristics. This results to a drastic data reduction between the sequence and the perceptual hash. Unlike cryptographic hash functions, perceptual hashes are not affected by "avalanche effect" and thus can be compared. The similarity distance between two hashes is estimated with the Hamming Distance, which is used to retrieve DNA sequences. Experiments that we conducted show that our approach is relevant for storing massive DNA sequences, and retrieving them.
Year
Venue
Field
2014
CoRR
Hash tree,Double hashing,Rolling hash,Theoretical computer science,Artificial intelligence,Primary clustering,Hash filter,Locality-sensitive hashing,Pattern recognition,Cryptographic hash function,Hash function,Mathematics,Machine learning
DocType
Volume
Citations 
Journal
abs/1412.5517
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Jocelyn De Goer De Herve100.34
Myoung-Ah Kang2479.77
Xavier Bailly300.34
Engelbert Mephu Nguifo455467.75