A model for estimating the occurrence of same-frequency words and the boundary between high- and low-frequency words in texts - Citegraph

Paper Info

Title
A model for estimating the occurrence of same-frequency words and the boundary between high- and low-frequency words in texts

Abstract
A simpler model is proposed for estimating the frequency of any same-frequency words and identifying the boundary point between high-frequency words and low-frequency words in a text, The model, based on a "maximum ranking method," assigns ranks to the words and estimates word frequency by the formula: Int[(-1 + (1 + 4D/In+1)(1/2))/2] > n* Int[(-1 + (1 + 4D/I-n)(1/2))/2]. The boundary value between high-frequency and low-frequency words is obtained by taking the square root of the number of different words in the text: n* = (D)(1/2). This straightforward model was used successfully with both English and Chinese texts, demonstrating that the frequency of words and the number of same-frequency words are dependent only on the vocabulary of a text (the number of different words) but not on its length. Like Zipf's Law, the model may be universally applicable.

Year	DOI	Venue
1999	3.3.CO;2-8" target="_self" class="small-link-text"10.1002/(SICI)1097-4571(1999)50:33.3.CO;2-8	JASIS
Keywords	DocType	Volume
low frequency,chinese,word frequency,english,research methodology,information science	Journal	50
Issue	ISSN	Citations
3	0002-8231	2
PageRank	References	Authors
0.57	4	3

Authors (3 rows)

Cited by (2 rows)

References (4 rows)

Name	Order	Citations	PageRank
Qinglan Sun	1	4	0.96
Debora Shaw	2	57	7.37
Charles H. Davis	3	5	1.49

1