Mining Gene Sets for Measuring Similarities - Citegraph

Paper Info

Title
Mining Gene Sets for Measuring Similarities

Abstract
In recent years, the development of high throughput devices for the massive parallel analyses of genomic data has lead to the generation of large amount of new biological evidences and has triggered the proliferation of data mining algorithms for the extraction of meaningful information. Microarrays for gene expression analyses are part of this revolution and provide important insight in molecular biology often in the form of coherent sets of genes representing previously uncharacterized processes. Large amount of data are continuously produced in this form, and computational approaches can significantly improve the efficient use of these results, since comparison among numbers of genes sets can give new meaningful information at no cost from the experimental biology point of view. To address this opportunity we designed and implemented FIT, a scalable, unsupervised algorithm that quantitatively compares different populations of gene sets using two distinct measures of similarity between any two gene sets. These measures are then used to obtain a summary statistic that describes the tightness of fit between sets belonging to two distinct populations of gene sets. We present the results of FIT on two data sets for the study of Lymphoma and Acute Lymphoblastic Leukemia. In both cases FIT was able to recapitulate the previous analyses on these datasets, to extend the results and to extract information likely to offer potential insights into the underlying biology.

Year	DOI	Venue
2006	10.1109/ISCC.2006.101	ISCC
Keywords	Field	DocType
gene expression analysis,meaningful information,genomic data,mining gene sets,measuring similarities,large amount,cases fit,gene set,molecular biology,experimental biology point,data mining algorithm,data analysis,gene expression,genomics,algorithm design and analysis,bioinformatics,information analysis,high throughput,data mining,throughput	Data mining,Data set,Gene,Algorithm design,Computer science,Lymphoblastic Leukemia,Genomics,Data mining algorithm,DNA microarray,Scalability	Conference
ISSN	ISBN	Citations
1530-1346	0-7695-2588-1	1
PageRank	References	Authors
0.43	8	7

Authors (7 rows)

Cited by (1 rows)

References (8 rows)

Name	Order	Citations	PageRank
Christine Nardini	1	65	9.00
Daniele Masotti	2	27	3.86
Sungroh Yoon	3	566	78.80
Enrico Macii	4	2405	349.96
Michael D. Kuo	5	9	1.23
Giovanni De Micheli	6	10245	1018.13
Luca Benini	7	13116	1188.49

1