Title
Term frequency dynamics in collaborative articles
Abstract
Documents on the World Wide Web are dynamic entities. Mainstream information retrieval systems and techniques are primarily focused on the latest version a document, generally ignoring its evolution over time. In this work, we study the term frequency dynamics in web documents over their lifespan. We use the Wikipedia as a document collection because it is a broad and public resource and, more important, because it provides access to the complete revision history of each document. We investigate the progression of similarity values over two projection variables, namely revision order and revision date. Based on this investigation we find that term frequency in encyclopedic documents - i.e. comprehensive and focused on a single topic - exhibits a rapid and steady progression towards the document's current version. The content in early versions quickly becomes very similar to the present version of the document.
Year
DOI
Venue
2010
10.1145/1860559.1860620
ACM Symposium on Document Engineering
Keywords
Field
DocType
revision order,latest version,early version,collaborative article,encyclopedic document,present version,document collection,current version,web document,term frequency dynamic,revision date,complete revision history,information retrieval system,term frequency,wikipedia,world wide web
World Wide Web,Information retrieval,Computer science,Mainstream,Database
Conference
Citations 
PageRank 
References 
0
0.34
7
Authors
3
Name
Order
Citations
PageRank
Sérgio Nunes112217.53
Cristina Ribeiro2747.91
Gabriel David313111.89