Title
A New Framework for Spatial Modeling and Synthesis of Genomic Sequences.
Abstract
This paper provides a framework for statistical modeling of genomic sequences. Such a framework can be used a the basis for the synthesize similar sequences. The synthesized sequences could then be used to make for further inference about the genomic sequences. We start by converting the sequence of nucleotides from the genome into a decimal sequence via Huffman coding. Using the HodrickPrescott filter (HP filter) this decimal sequence is decomposed into two components, namely, trend and cyclic. Next, the ARIMA-GARCH statistical modeling approach is applied on the trend component exhibiting heteroskedasticity. The autoregressive integrated moving average (ARIMA) is used to capture the linear characteristics of the sequence, while the generalized autoregressive conditional heteroskedasticity (GARCH) is applied to model the statistical nonlinearity of the genome sequence. This modeling approach allows us to synthesize a given genomic sequence based on its statistical charatceristics. Finally, the probability distribution function (PDF) of a given sequence is estimated using a Gaussian mixture model, and based on the estimated PDF, we determine a new PDF representing sequences that statistically counteract the original sequence. We applied the proposed framework on several genes, as well as on the HIV nucleotide sequence. The corresponding results show some promise.
Year
DOI
Venue
2020
10.1109/BIBM49941.2020.9313090
BIBM
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Salman Mohamadi100.34
Donald A. Adjeroh281164.20
Behnoush Behi300.34
hamidreza amindavar421536.34