Title
Latent Dirichlet Allocation For Classification Using Gene Expression Data
Abstract
Understanding the role of differential gene expression in the development of, and molecular response to, cancer is a complex problem that remains challenging, in part due to the sheer number of genes, gene products, and metabolites involved. In this paper, we employ an unsupervised topic model, Latent Dirichlet Allocation (LDA) to explore patterns of gene expression in healthy and cancer tissues. An important advantage of LDA compared to alternative statistical and machine learning methods is its proven ability to handle sparse inputs over an extremely large numbers of features in an unsupervised manner. LDA has been recently applied for clustering and exploring genomic data but not for classification and prediction. In this paper, we try to optimize the protocol and parameters for efficient implementation of LDA. Here, messenger RNA (mRNA) sequence data from breast cancer and healthy tissue is used to determine an effective approach for the application of LDA to classification of cancer versus healthy tissue. We describe our study in two phases: First, various parameters like the number of topics, bins and passes were optimized for LDA. Next we developed a novel LDA-based classification approach to classify unknown samples based on similarity of co-expression patterns. Evaluation to assess the effectiveness of this approach shows that LDA can achieve high accuracy compared to alternative approaches. Overall, our results project LDA as a promising approach for classification of tissue types based on gene expression data in cancer studies.
Year
DOI
Venue
2017
10.1109/BIBE.2017.00014
2017 IEEE 17TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE)
Keywords
Field
DocType
Topic modeling, Latent Dirichlet Allocation, Classification, Machine learning, Cancer, Gene expression
Computer science,Linear model,Support vector machine,Delirium,Boosting (machine learning),Artificial intelligence,Random forest,Artificial neural network,Generalized additive model,Machine learning,Gradient boosting
Conference
Volume
ISSN
Citations 
2017
2471-7819
0
PageRank 
References 
Authors
0.34
0
6
Name
Order
Citations
PageRank
Anis Davoudi143.44
Tezcan Ozrazgat-Baslanti212.37
Ashkan Ebadi3486.90
Alberto C. Bursian400.34
Azra Bihorac5508.63
Parisa Rashidi685946.92