Induction of comprehensible models for gene expression datasets by subgroup discovery methodology. - Citegraph

Paper Info

Title
Induction of comprehensible models for gene expression datasets by subgroup discovery methodology.

Abstract
Finding disease markers (classifiers) from gene expression data by machine learning algorithms is characterized by a high risk of overfitting the data due the abundance of attributes (simultaneously measured gene expression values) and shortage of available examples (observations). To avoid this pitfall and achieve predictor robustness, state-of-the-art approaches construct complex classifiers that combine relatively weak contributions of up to thousands of genes (attributes) to classify a disease. The complexity of such classifiers limits their transparency and consequently the biological insights they can provide. The goal of this study is to apply to this domain the methodology of constructing simple yet robust logic-based classifiers amenable to direct expert interpretation. On two well-known, publicly available gene expression classification problems, the paper shows the feasibility of this approach, employing a recently developed subgroup discovery methodology. Some of the discovered classifiers allow for novel biological interpretations.

Year	DOI	Venue
2004	10.1016/j.jbi.2004.07.007	Journal of Biomedical Informatics
Keywords	Field	DocType
complex classifier,disease markers,gene expression measurements,gene expression datasets,disease marker,comprehensible classification,subgroup discovery,novel biological interpretation,available example,subgroup discovery methodology,direct expert interpretation,gene expression value,comprehensible model,gene expression data,biological insight,available gene expression classification,machine learning,gene expression	Data mining,Computer science,Robustness (computer science),Overfitting,Economic shortage,Disease markers	Journal
Volume	Issue	ISSN
37	4	1532-0464
Citations	PageRank	References
27	1.22	14
Authors
4

Authors (4 rows)

Cited by (27 rows)

References (14 rows)

Name	Order	Citations	PageRank
Dragan Gamberger	1	757	60.53
Nada Lavrač	2	989	72.19
Filip Železný	3	129	13.09
Jakub Tolar	4	89	5.88

1