Mining Pure Patterns in Texts - Citegraph

Paper Info

Title
Mining Pure Patterns in Texts

Abstract
We herein investigate finding unusual patterns from a given string as a text. In the present paper, the pattern is expressed as a sub string of the string. The natural assumption with respect to the frequency of a pattern is that the shorter the length of the pattern, the larger the frequency of the pattern. We define a pattern to be pure if the frequencies of all of the sub strings of the pattern are the same as the frequency of the pattern. This means that the sub strings appear only within the pattern in the string. This condition is in contrast to the natural assumption. The present paper proposes three statistics for quantifying the purity of a pattern, i.e., probability, entropy, and difference, which are calculated based on the frequency of the pattern and its sub strings. Experiments using DNA sequences reveal that patterns with large probability correspond to the features of the sequences.

Year	DOI	Venue
2012	10.1109/IIAI-AAI.2012.75	IIAI-AAI
Keywords	Field	DocType
large probability,dna sequence,natural assumption,present paper,sub string,mining pure patterns,unusual pattern,entropy,dna,databases,text mining,probability,statistical analysis,data mining,text analysis	Text mining,Pattern recognition,Artificial intelligence,Mathematics,Statistical analysis	Conference
Citations	PageRank	References
2	0.46	4
Authors
4

Authors (4 rows)

Cited by (2 rows)

References (4 rows)

Name	Order	Citations	PageRank
Yasuhiro Yamada	1	52	10.97
Tetsuya Nakatoh	2	46	12.64
Kensuke Baba	3	56	18.62
Daisuke Ikeda	4	52	7.95

1