Title
On the Power Laws of Language: Word Frequency Distributions
Abstract
About eight decades ago, Zipf postulated that the word frequency distribution of languages is a power law, i.e., it is a straight line on a log-log plot. Over the years, this phenomenon has been documented and studied extensively. For many corpora, however, the empirical distribution barely resembles a power law: when plotted on a log-log scale, the distribution is concave and appears to be composed of two differently sloped straight lines joined by a smooth curve. A simple generative model is proposed to capture this phenomenon. The word frequency distributions produced by this model are shown to match the observations both analytically and empirically.
Year
DOI
Venue
2017
10.1145/3077136.3080821
SIGIR
Field
DocType
ISBN
Line (geometry),Zipf's law,Empirical distribution function,Word lists by frequency,Computer science,Speech recognition,Phenomenon,Smoothness,Power law,Generative model
Conference
978-1-4503-5022-8
Citations 
PageRank 
References 
1
0.35
17
Authors
3
Name
Order
Citations
PageRank
Flavio Chierichetti162639.42
Ravi Kumar2139321642.48
Bo Pang35795451.00