Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge.

Paper Info

Title
Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge.

Abstract
Motivation: After more than a decade since microarrays were used to predict phenotype of biological samples, real-life applications for disease screening and identification of patients who would best benefit from treatment are still emerging. The interest of the scientific community in identifying best approaches to develop such prediction models was reaffirmed in a competition style international collaboration called IMPROVER Diagnostic Signature Challenge whose results we describe herein. Results: Fifty-four teams used public data to develop prediction models in four disease areas including multiple sclerosis, lung cancer, psoriasis and chronic obstructive pulmonary disease, and made predictions on blinded new data that we generated. Teams were scored using three metrics that captured various aspects of the quality of predictions, and best performers were awarded. This article presents the challenge results and introduces to the community the approaches of the best overall three performers, as well as an R package that implements the approach of the best overall team. The analyses of model performance data submitted in the challenge as well as additional simulations that we have performed revealed that (i) the quality of predictions depends more on the disease endpoint than on the particular approaches used in the challenge; (ii) the most important modeling factor (e. g. data preprocessing, feature selection and classifier type) is problem dependent; and (iii) for optimal results datasets and methods have to be carefully matched. Biomedical factors such as the disease severity and confidence in diagnostic were found to be associated with the misclassification rates across the different teams.

Year	DOI	Venue
2013	10.1093/bioinformatics/btt492	BIOINFORMATICS
Keywords	Field	DocType
phenotype,gene expression profiling	Data mining,Disease,Microarray,Feature selection,Computer science,Data pre-processing,Predictive modelling,Bioinformatics,Disease Screening,Classifier (linguistics),R package	Journal
Volume	Issue	ISSN
29	22	1367-4803
Citations	PageRank	References
10	0.98	3
Authors
20

Authors (20 rows)

Cited by (10 rows)

References (3 rows)

Name	Order	Citations	PageRank
adi l tarca	1	130	9.39
Mario Lauria	2	628	95.12
Michael Unger	3	10	0.98
Erhan Bilal	4	28	4.09
Stéphanie Boué	5	36	3.50
Kushal Kumar Dey	6	10	1.32
Julia Hoeng	7	91	10.97
Heinz Koeppl	8	159	36.18
Florian Martin	9	590	53.16
Pablo Meyer	10	62	8.26
Preetam Nandy	11	12	2.05
Raquel Norel	12	44	9.09
Manuel C. Peitsch	13	214	27.32
John Jeremy Rice	14	71	11.16
Roberto Romero	15	195	12.04
Gustavo Stolovitzky	16	738	51.84
Marja Talikka	17	41	3.70
Yang Xiang	18	18	1.74
Christoph Zechner	19	20	2.24
Improver Dsc Collaborators	20	10	0.98