Development of an intelligent distributed news retrieval system - Citegraph

Paper Info

Title
Development of an intelligent distributed news retrieval system

Abstract
Currently available web news retrieval systems face a number of problems in that web-based news retrieval requires the ability to quickly and accurately process and update a very large amount of data which are constantly being updated. In this paper, we present the development of an intelligent distributed web news retrieval system the goal of which is to accurately retrieve and organize the web news information. It includes: a novel optimized crawler algorithm whose fetching-speed is several times faster than that of the traditional crawler; a keen tag based extraction algorithm which can extract the data rich content with minimal manual effort and which also allows data to be classified as important or not important so that the crawler can revisit and update important data; a modified MapReduce improved by estimating the execution time of each subtask, which is proven to be able to reduce the number of the unusual tasks and shorten the whole job execution time.

Year	DOI	Venue
2012	10.3233/KES-2011-0237	KES Journal
Keywords	Field	DocType
execution time,extraction algorithm,web news retrieval system,crawler algorithm,web-based news retrieval,important data,available web news retrieval,web news information,data rich content,traditional crawler,web crawler	Information retrieval,Web news,Extraction algorithm,Computer science,Artificial intelligence,Execution time,Web crawler,Machine learning	Journal
Volume	Issue	ISSN
16	2	1327-2314
Citations	PageRank	References
1	0.35	18
Authors
3

Authors (3 rows)

Cited by (1 rows)

References (18 rows)

Name	Order	Citations	PageRank
James N. K. Liu	1	529	44.35
K. C. Choi	2	1	0.35
J. Y. Chai	3	1	0.35

1