Title
Mining Gene Sets for Measuring Similarities
Abstract
In recent years, the development of high throughput devices for the massive parallel analyses of genomic data has lead to the generation of large amount of new biological evidences and has triggered the proliferation of data mining algorithms for the extraction of meaningful information. Microarrays for gene expression analyses are part of this revolution and provide important insight in molecular biology often in the form of coherent sets of genes representing previously uncharacterized processes. Large amount of data are continuously produced in this form, and computational approaches can significantly improve the efficient use of these results, since comparison among numbers of genes sets can give new meaningful information at no cost from the experimental biology point of view. To address this opportunity we designed and implemented FIT, a scalable, unsupervised algorithm that quantitatively compares different populations of gene sets using two distinct measures of similarity between any two gene sets. These measures are then used to obtain a summary statistic that describes the tightness of fit between sets belonging to two distinct populations of gene sets. We present the results of FIT on two data sets for the study of Lymphoma and Acute Lymphoblastic Leukemia. In both cases FIT was able to recapitulate the previous analyses on these datasets, to extend the results and to extract information likely to offer potential insights into the underlying biology.
Year
DOI
Venue
2006
10.1109/ISCC.2006.101
ISCC
Keywords
Field
DocType
gene expression analysis,meaningful information,genomic data,mining gene sets,measuring similarities,large amount,cases fit,gene set,molecular biology,experimental biology point,data mining algorithm,data analysis,gene expression,genomics,algorithm design and analysis,bioinformatics,information analysis,high throughput,data mining,throughput
Data mining,Data set,Gene,Algorithm design,Computer science,Lymphoblastic Leukemia,Genomics,Data mining algorithm,DNA microarray,Scalability
Conference
ISSN
ISBN
Citations 
1530-1346
0-7695-2588-1
1
PageRank 
References 
Authors
0.43
8
7
Name
Order
Citations
PageRank
Christine Nardini1659.00
Daniele Masotti2273.86
Sungroh Yoon356678.80
Enrico Macii42405349.96
Michael D. Kuo591.23
Giovanni De Micheli6102451018.13
Luca Benini7131161188.49