Title
Does encoding matter? A novel view on the quantitative genetic trait prediction problem
Abstract
Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models which require quantitative encodings for the genotypes. There are lots of work on the prediction algorithms, but none of the existing work investigated the effects of the encodings on the genetic trait prediction problem. In this work, we view the genetic trait prediction problem from a novel angle: a multiple regression on categorical data problem, which requires encoding the categorical data into numerical data. We evaluate various encoding mechanisms and investigate by theory how different encodings affect the performance of the genetic trait prediction algorithms. To our knowledge, this is the first analysis on different encoding mechanisms for genetic trait prediction problem. We further proposed two novel encoding methods and we show that they are able to generate numerical features with higher predictive power. Our experiments show that our methods are superior to the other encoding methods for both single marker model and epistasis model.
Year
DOI
Venue
2015
10.1109/BIBM.2015.7359667
IEEE International Conference on Bioinformatics and Biomedicine
Keywords
Field
DocType
Encoding,Epistasis,Quantitative genetic trait prediction,Ridge regression
Quantitative trait locus,Trait,Epistasis,Computer science,Categorical variable,Prediction algorithms,Artificial intelligence,Single-nucleotide polymorphism,Bioinformatics,Machine learning,Encoding (memory),Linear regression
Conference
Volume
Issue
ISSN
17
S-9
1471-2105
ISBN
Citations 
PageRank 
978-1-4673-6799-8
0
0.34
References 
Authors
4
2
Name
Order
Citations
PageRank
Dan He113312.54
Laxmi Parida277377.21