Title
A multivariate Bernoulli model to predict DNaseI hypersensitivity status from haplotype data
Abstract
Motivation: Haplotype models enjoy a wide range of applications in population inference and disease gene discovery. The hidden Markov models traditionally used for haplotypes are hindered by the dubious assumption that dependencies occur only between consecutive pairs of variants. In this article, we apply the multivariate Bernoulli (MVB) distribution to model haplotype data. The MVB distribution relies on interactions among all sets of variants, thus allowing for the detection and exploitation of long-range and higher-order interactions. We discuss penalized estimation and present an efficient algorithm for fitting sparse versions of the MVB distribution to haplotype data. Finally, we showcase the benefits of the MVB model in predicting DNaseI hypersensitivity (DH) status-an epigenetic mark describing chromatin accessibility-from population-scale haplotype data. Results: We fit the MVB model to real data from 59 individuals on whom both haplotypes and DH status in lymphoblastoid cell lines are publicly available. The model allows prediction of DH status from genetic data (prediction R-2 = 0.12 in cross-validations). Comparisons of prediction under the MVB model with prediction under linear regression (best linear unbiased prediction) and logistic regression demonstrate that the MVB model achieves about 10% higher prediction R-2 than the two competing methods in empirical data.
Year
DOI
Venue
2015
10.1093/bioinformatics/btv397
BIOINFORMATICS
Field
DocType
Volume
Data mining,Linear model,Multivariate statistics,Computer science,Haplotype,Software,Bioinformatics,Multivariate analysis,Bernoulli's principle
Journal
31
Issue
ISSN
Citations 
21
1367-4803
0
PageRank 
References 
Authors
0.34
3
3
Name
Order
Citations
PageRank
Huwenbo Shi1163.26
Bogdan Paşaniuc29515.06
Kenneth Lange332.80