Title
The PEPR GeneChip data warehouse, and implementation of a dynamic time series query tool (SGQT) with graphical interface.
Abstract
Publicly accessible DNA databases (genome browsers) are rapidly accelerating post-genomic research (see http://www.genome.ucsc.edu/), with integrated genomic DNA, gene structure, EST/splicing and cross-species ortholog data. DNA databases have relatively low dimensionality; the genome is a linear code that anchors all associated data. In contrast, RNA expression and protein databases need to be able to handle very high dimensional data, with time, tissue, cell type and genes, as interrelated variables. The high dimensionality of microarray expression profile data, and the lack of a standard experimental platform have complicated the development of web-accessible databases and analytical tools. We have designed and implemented a public resource of expression profile data containing 1024 human, mouse and rat Affymetrix GeneChip expression profiles, generated in the same laboratory, and subject to the same quality and procedural controls (Public Expression Profiling Resource; PEPR). Our Oracle-based PEPR data warehouse includes a novel time series query analysis tool (SGOT), enabling dynamic generation of graphs and spreadsheets showing the action of any transcript of interest over time. In this report, we demonstrate the utility of this tool using a 27 time point, in vivo muscle regeneration series. This data warehouse and associated analysis tools provides access to multidimensional microarray data through web-based interfaces, both for download of all types of raw data for independent analysis, and also for straightforward gene-based queries. Planned implementations of PEPR will include web-based remote entry of projects adhering to quality control and standard operating procedure (QC/SOP) criteria, and automated output of alternative probe set algorithms for each project (see http://microarray.cnmcresearch.org/pgadatatable.asp).
Year
DOI
Venue
2004
10.1093/nar/gkh003
NUCLEIC ACIDS RESEARCH
Keywords
Field
DocType
linear code,data warehouse,gene structure,time series,quality control,graphical interface,genomic dna,association analysis,microarray data,web accessibility
Data warehouse,Data mining,Clustering high-dimensional data,Biology,Oracle,Genomics,Graphical user interface,Software,Gene chip analysis,Bioinformatics,Genetics,Gene expression profiling
Journal
Volume
Issue
ISSN
32
Database issue
0305-1048
Citations 
PageRank 
References 
7
0.87
2
Authors
8
Name
Order
Citations
PageRank
Josephine Chen170.87
Po Zhao2736.87
Donald Massaro370.87
Linda B Clerch470.87
Richard R Almon5422.19
Debra C Dubois6422.19
William J Jusko7422.53
Eric P. Hoffman827322.82