Title
Classifying Big DNA Methylation Data: A Gene-Oriented Approach.
Abstract
Thanks to Next Generation Sequencing (NGS) techniques, public available genomic data of cancer is growing quickly. Indeed, the largest public database of cancer called The Cancer Genome Atlas (TCGA) contains huge amounts of biomedical big data to be analyzed with advanced knowledge extraction methods. In this work, we focus on the NGS experiment of DNA methylation, whose data matrices are composed of hundred thousands of features (i.e., methylated sites). We propose an efficient data processing procedure that permits to obtain a gene-oriented organization and enables to perform a supervised machine learning analysis with state-of-the-art methods. The procedure divides the original data matrices into several sub-matrices, each one containing the sites located within the same gene. We extract from TCGA DNA methylation data of three tumor types (i.e.,breast, prostate, and thyroid carcinomas) and we are able to successfully discriminate tumoral from non tumoral samples using function-, tree-, and rule-based classifiers. Finally, we select the best performing genes (matrices) ranking them according to the accuracy of the classifiers and we execute an enrichment analysis of them. Those genes can be further investigated by domain experts for proving their relation to the cancers under study.
Year
DOI
Venue
2018
10.1007/978-3-319-99133-7_11
Communications in Computer and Information Science
Keywords
Field
DocType
Classification,DNA methylation,Cancer
Genome,Data mining,Gene,Ranking,Computer science,DNA methylation,DNA sequencing,Knowledge extraction,Computational biology,Big data,Cancer
Conference
Volume
ISSN
Citations 
903
1865-0929
2
PageRank 
References 
Authors
0.36
12
5
Name
Order
Citations
PageRank
Emanuel Weitschek18410.63
Fabio Cumbo2154.12
Eleonora Cappelli353.17
Giovanni Felici420.36
Paola Bertolazzi535232.81