Title
EST clustering error evaluation and correction
Abstract
Motivation: The gene expression intensity information conveyed by (EST) Expressed Sequence Tag data can be used to infer important cDNA library properties, such as gene number and expression patterns. However, EST clustering errors, which often lead to greatly inflated estimates of obtained unique genes, have become a major obstacle in the analyses. The EST clustering error structure, the relationship between clustering error and clustering criteria, and possible error correction methods need to be systematically investigated. Results: We identify and quantify two types of EST clustering error, namely, Type I and II in EST clustering using CAP3 assembling program. A Type I error occurs when ESTs from the same gene do not form a cluster whereas a Type II error occurs when ESTs from distinct genes are falsely clustered together. While the Type II error rate is P ≥ 95%, may even inflate the Type I error in both cases. We demonstrate that ∼80% of the Type I error is due to insufficient overlap among sibling ESTs (ISO error) in 5' EST clustering. A novel statistical approach is proposed to correct ISO error to provide more accurate estimates of the true gene cluster profile. Availability: We have automated the methods developed in this paper in a web-based software ESTstat at http://cwdg5.bio.psu.edu/eststat. Supplementary information: http://cwdg5.bio.psu.edu/eststat
Year
DOI
Venue
2004
10.1093/bioinformatics/bth342
Bioinformatics
Keywords
Field
DocType
gene expression,type i error,gene cluster,cdna library,expressed sequence tag,error correction,type ii error
Gene cluster,Data mining,Expressed sequence tag,Error detection and correction,Bioinformatics,Type I and type II errors,Cluster analysis,Mathematics
Journal
Volume
Issue
ISSN
20
17
1367-4803
Citations 
PageRank 
References 
10
1.02
5
Authors
7
Name
Order
Citations
PageRank
Ji-ping Z. Wang1121.59
Bruce G. Lindsay21173638.97
James Leebens-mack3162.14
Liying Cui4444.28
P. Kerr Wall5101.02
W Miller61301295.71
Claude W. Depamphilis7565.55