Title
Matching experiments across species using expression values and textual information.
Abstract
With the vast increase in the number of gene expression datasets deposited in public databases, novel techniques are required to analyze and mine this wealth of data. Similar to the way BLAST enables cross-species comparison of sequence data, tools that enable cross-species expression comparison will allow us to better utilize these datasets: cross-species expression comparison enables us to address questions in evolution and development, and further allows the identification of disease-related genes and pathways that play similar roles in humans and model organisms. Unlike sequence, which is static, expression data changes over time and under different conditions. Thus, a prerequisite for performing cross-species analysis is the ability to match experiments across species.To enable better cross-species comparisons, we developed methods for automatically identifying pairs of similar expression datasets across species. Our method uses a co-training algorithm to combine a model of expression similarity with a model of the text which accompanies the expression experiments. The co-training method outperforms previous methods based on expression similarity alone. Using expert analysis, we show that the new matches identified by our method indeed capture biological similarities across species. We then use the matched expression pairs between human and mouse to recover known and novel cycling genes as well as to identify genes with possible involvement in diabetes. By providing the ability to identify novel candidate genes in model organisms, our method opens the door to new models for studying diseases.Source code and supplementary information is available at: www.andrew.cmu.edu/user/aaronwis/cotrain12.
Year
DOI
Venue
2012
10.1093/bioinformatics/bts205
Bioinformatics
Keywords
Field
DocType
expression experiment,cross-species comparison,expression value,matching experiment,gene expression,expression data change,expression pair,textual information,cross-species expression comparison,expression similarity,model organism,similar expression datasets,supplementary information
Data mining,Candidate gene,Textual information,Source code,Computer science,Genomics,Data sequences,Bioinformatics,Model organism,Gene expression profiling
Journal
Volume
Issue
ISSN
28
12
1367-4811
Citations 
PageRank 
References 
0
0.34
4
Authors
3
Name
Order
Citations
PageRank
Aaron Wise140.77
Zoltán N. Oltvai212110.87
Ziv Bar-Joseph31207112.00