Title
Parallel classification and feature selection in microarray data using SPRINT
Abstract
The statistical language R is favoured by many biostatisticians for processing microarray data. In recent times, the quantity of data that can be obtained in experiments has risen significantly, making previously fast analyses time consuming or even not possible at all with the existing software infrastructure. High performance computing HPC systems offer a solution to these problems but at the expense of increased complexity for the end user. The Simple Parallel R Interface is a library for R that aims to reduce the complexity of using HPC systems by providing biostatisticians with drop-in parallelised replacements of existing R functions. In this paper we describe parallel implementations of two popular techniques: exploratory clustering analyses using the random forest classifier and feature selection through identification of differentially expressed genes using the rank product method. Copyright © 2012 John Wiley & Sons, Ltd.
Year
DOI
Venue
2014
10.1002/cpe.2928
Concurrency and Computation: Practice & Experience
Keywords
Field
DocType
Genomics,HPC,Parallel programming
Data mining,Feature selection,End user,Supercomputer,Computer science,R interface,Rank product,Software,Random forest,Cluster analysis
Journal
Volume
Issue
ISSN
26
4
1532-0626
Citations 
PageRank 
References 
3
0.54
9
Authors
7
Name
Order
Citations
PageRank
L Mitchell1201.95
Terence M. Sloan2555.43
Muriel Mewissen3583.67
Peter Ghazal41154.82
Thorsten Forster5432.23
Michal Piotrowski661.71
Arthur S. Trew7412.12