Knowledge capture from multiple online sources with the extensible web retrieval toolkit (eWRT) - Citegraph

Paper Info

Title
Knowledge capture from multiple online sources with the extensible web retrieval toolkit (eWRT)

Abstract
Knowledge capture approaches in the age of massive Web data require robust and scalable mechanisms to acquire, consolidate and pre-process large amounts of heterogeneous data, both unstructured and structured. This paper addresses this requirement by introducing the Extensible Web Retrieval Toolkit (eWRT), a modular Python API for retrieving social data from Web sources such as Delicious, Flickr, Yahoo! and Wikipedia. eWRT has been released as an open source library under GNU GPLv3. It includes classes for caching and data management, and provides low-level text processing capabilities including language detection, phonetic string similarity measures, and string normalization.

Year	DOI	Venue
2013	10.1145/2479832.2479861	K-CAP
Keywords	Field	DocType
string normalization,knowledge capture approach,social data,heterogeneous data,extensible web retrieval toolkit,web source,multiple online source,gnu gplv3,massive web data,phonetic string similarity measure,data management,social media,text mining,knowledge extraction,data acquisition	World Wide Web,Information retrieval,Computer science,Language identification,Knowledge extraction,Modular design,String metric,Data management,Python (programming language),Text processing,Scalability	Conference
Citations	PageRank	References
0	0.34	13
Authors
3

Authors (3 rows)

Cited by (0 rows)

References (13 rows)

Name	Order	Citations	PageRank
Albert Weichselbraun	1	291	28.39
Arno Scharl	2	696	67.13
Heinz-Peter Lang	3	12	1.54

1