Title
A XML-Based Approach to Integrating Heterogeneous Yeast Genome Data
Abstract
While there are an increasing number of genomes (including the human genome) whose sequences have been fully or nearly completed, the budding yeast Saccharomyces cerevisiae was the first fully sequenced eukaryotic genome. Given its ease of genetic manipulation and the fact that many of its genes are strikingly similar to human genes, the yeast genome has been studied extensively through a wide range of biological experiments (e.g., microarray experiments). As a result, a large variety of types of yeast genome data have been generated and made accessible through many resources (e.g., SGD, MIPS, and YPD). While these resources serve many specific needs of individual researchers, we can reap more benefits by integrating these disparate datasets to facilitate larger-context data mining. However, such integrated analysis is hampered by the heterogeneous formats that are used for data distribution. With the increasing use of eXtensible Mark Language (XML) in the bioinformatics domain, we demonstrate how to use XML to standardize the exchange of a variety of types of yeast data between different resources. In particular, we propose a standard XML format called "Yeast Hub XML" (YHX). This format consists of: i) metadata and ii) data. While the former describes the resource and data structure, the latter is used to represent the data. In addition, we apply various XML-related technologies including XPath and XSLT to query, integrate, and transform multiple XML datasets. We have implemented a prototype yeast hub server that allows sharing, querying, and integration of different types and formats of yeast genome data that are located in disparate sources.
Year
Venue
Keywords
2004
METMBS '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MATHEMATICS AND ENGINEERING TECHNIQUES IN MEDICINE AND BIOLOGICAL SCIENCES
data mining,data structure,genetics,human genome
Field
DocType
Citations 
Genome,XML,Computer science,Yeast,Computational biology,Genetics
Conference
1
PageRank 
References 
Authors
0.73
12
6
Name
Order
Citations
PageRank
Kei-hoi Cheung166460.65
Deyun Pan2343.71
Andrew Smith3276.55
Michael Seringhaus4101.42
Shawn M. Douglas511.07
Mark Gerstein635445.41