Abstract | ||
---|---|---|
Human language, the most powerful communication system in history, is closely
associated with cognition. Written text is one of the fundamental
manifestations of language, and the study of its universal regularities can
give clues about how our brains process information and how we, as a society,
organize and share it. Still, only classical patterns such as Zipf's law have
been explored in depth. In contrast, other basic properties like the existence
of bursts of rare words in specific documents, the topical organization of
collections, or the sublinear growth of vocabulary size with the length of a
document, have only been studied one by one and mainly applying heuristic
methodologies rather than basic principles and general mechanisms. As a
consequence, there is a lack of understanding of linguistic processes as
complex emergent phenomena. Beyond Zipf's law for word frequencies, here we
focus on Heaps' law, burstiness, and the topicality of document collections,
which encode correlations within and across documents absent in random null
models. We introduce and validate a generative model that explains the
simultaneous emergence of all these patterns from simple rules. As a result, we
find a connection between the bursty nature of rare words and the topical
organization of texts and identify dynamic word ranking and memory across
documents as key mechanisms explaining the non trivial organization of written
text. Our research can have broad implications and practical applications in
computer science, cognitive science, and linguistics. |
Year | Venue | Keywords |
---|---|---|
2009 | Clinical Orthopaedics and Related Research | cognitive science,communication system,null model,word frequency |
Field | DocType | Volume |
Zipf's law,Computer science,Artificial intelligence,Natural language processing,Cognition,Heuristic,Ranking,Word lists by frequency,Burstiness,Vocabulary,Linguistics,Machine learning,Generative model | Journal | abs/0902.0 |
Citations | PageRank | References |
2 | 0.65 | 1 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
M Ángeles Serrano | 1 | 257 | 17.84 |
Alessandro Flammini | 2 | 1705 | 94.69 |
Filippo Menczer | 3 | 3874 | 268.67 |