Abstract | ||
---|---|---|
The colossal growth of volatile online text data evokes the demand for automatic text analysis tools to identify worthwhile information. Documents, as well as text streams, can be structured beyond the concept of frequency distributions.Here we introduce a novel method that provides a relative measure for information value over a time series that is mapped by a dynamic trie structure. We adapt the concept of entropy for textual data and employ a compression-based estimation method. The algorithm can perform in a real-time scenario because of its linear complexity and since it is based on a dynamic history of predefined size.We show the suitability of our method with an experimental dataset and compare our results to an existing approach. Our results reveal structural properties of the texts and permit for deeper analysis of the presumably information peaks. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1007/978-3-642-53862-9_59 | COMPUTER AIDED SYSTEMS THEORY, PT II |
Keywords | Field | DocType |
document analysis, information retrieval, entropy estimation, data compression, trie data structure | Entropy estimation,Frequency distribution,Text mining,Document analysis,Information retrieval,Computer science,Theoretical computer science,Artificial intelligence,Natural language processing,Data compression,Trie | Conference |
Volume | ISSN | Citations |
8112 | 0302-9743 | 0 |
PageRank | References | Authors |
0.34 | 11 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Thomas Bohne | 1 | 7 | 1.18 |
Uwe M. Borghoff | 2 | 412 | 175.51 |