Predicting Approximate Protein-DNA Binding Cores Using Association Rule Mining - Citegraph

Paper Info

Title
Predicting Approximate Protein-DNA Binding Cores Using Association Rule Mining

Abstract
The studies of protein-DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) are important bioinformatics topics. High-resolution (length<;10) TF-TFBS binding cores are discovered by expensive and time-consuming 3D structure experiments. Recent association rule mining approaches on low-resolution binding sequences (TF length>;490) are shown promising in identifying accurate binding cores without using any 3D structures. While the current association rule mining method on this problem addresses exact sequences only, the most recent ad hoc method for approximation does not establish any formal model and is limited by experimentally known patterns. As biological mutations are common, it is desirable to formally extend the exact model into an approximate one. In this paper, we formalize the problem of mining approximate protein-DNA association rules from sequence data and propose a novel efficient algorithm to predict protein-DNA binding cores. Our two-phase algorithm first constructs two compact intermediate structures called frequent sequence tree (FS-Tree) and frequent sequence class tree (FSCTree). Approximate association rules are efficiently generated from the structures and bioinformatics concepts (position weight matrix and information content) are further employed to prune meaningless rules. Experimental results on real data show the performance and applicability of the proposed algorithm.

Year	DOI	Venue
2012	10.1109/ICDE.2012.86	ICDE
Keywords	Field	DocType
novel efficient algorithm,biological mutation,frequent sequence tree,trees (mathematics),fs-tree,fsctree,3d structure experiment,protein-dna association rules,binding site,transcription factor binding sites,matrix algebra,exact sequence,compact intermediate structure,proteins,approximate association rule,accurate binding core,information content,tfbs,low-resolution binding sequences,approximate protein-dna binding cores,association rule mining,transcription factor,current association rule mining,predicting approximate protein-dna binding,data mining,dna,position weight matrix,frequent sequence class tree,bioinformatics,approximate protein-dna association rule,high-resolution binding cores,high resolution,association rules,transcription factor binding site,pulse width modulation,databases,association rule	Data mining,DNA binding site,Computer science,Matrix algebra,Position weight matrix,Theoretical computer science,DNA,Association rule learning,Data sequences	Conference
ISSN	ISBN	Citations
1063-6382	978-1-4673-0042-1	8
PageRank	References	Authors
0.57	22	4

Authors (4 rows)

Cited by (8 rows)

References (22 rows)

Name	Order	Citations	PageRank
Po-Yuen Wong	1	11	0.95
Tak-ming Chan	2	190	13.57
Man Hon Wong	3	814	233.13
Kwong-Sak Leung	4	1887	205.58

1