Abstract | ||
---|---|---|
About eight decades ago, Zipf postulated that the word frequency distribution of languages is a power law, i.e., it is a straight line on a log-log plot. Over the years, this phenomenon has been documented and studied extensively. For many corpora, however, the empirical distribution barely resembles a power law: when plotted on a log-log scale, the distribution is concave and appears to be composed of two differently sloped straight lines joined by a smooth curve. A simple generative model is proposed to capture this phenomenon. The word frequency distributions produced by this model are shown to match the observations both analytically and empirically. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1145/3077136.3080821 | SIGIR |
Field | DocType | ISBN |
Line (geometry),Zipf's law,Empirical distribution function,Word lists by frequency,Computer science,Speech recognition,Phenomenon,Smoothness,Power law,Generative model | Conference | 978-1-4503-5022-8 |
Citations | PageRank | References |
1 | 0.35 | 17 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Flavio Chierichetti | 1 | 626 | 39.42 |
Ravi Kumar | 2 | 13932 | 1642.48 |
Bo Pang | 3 | 5795 | 451.00 |