Title
Design of local web content observatory system
Abstract
The amount of information on the web is growing rapidly. However, considering a particular group or country, it is very difficult to know how much relevant web contents are published and which are in what language and on what specific subject. Knowing the status of local web content of a country or a culture is of critical importance for making a decision on policy and strategy design for the development of the multi-lingual and multi-cultural web. This research work is therefore to design a model for a local web content observatory system that measures the qualitative and quantitative content of different domains. The local web content observatory system consists of six components -the crawler, content extractor, statistical tracker, language identifier, Web document categorizer and report generator. Though the model developed is generic and can be applied to any country or culture, to test and evaluate the system, we have selected all domains hosted under the. et domain. Accordingly about two thousand seed URLs under the. et domain are used and the crawler collected around 263,031 web documents. The accuracy rate measures employed to the language identifier obtained a rate of 98.67%. To demonstrate the effectiveness of the local web content categorizer precision, recall and F-measures test were conducted and an average precision of 91.7%, a recall of 97.2% and an F-measures of 94.25% is obtained for English document and a precision of 91.7%, recall of 87.85% and F-measures of 86.65% obtained for Amharic document. The average accuracy rate of the statistical tracker is 98.72%.
Year
Venue
Keywords
2015
PROCEEDINGS OF THE 2015 12TH IEEE AFRICON INTERNATIONAL CONFERENCE - GREEN INNOVATION FOR AFRICAN RENAISSANCE (AFRICON)
Local Web Content Observatory,Crawler,Language Identification,Web Document Categorization,Information Retrieval
Field
DocType
ISSN
Web design,Web search engine,Static web page,Web mining,Information retrieval,Web page,Computer science,Web standards,Data Web,Web crawler
Conference
2153-0025
Citations 
PageRank 
References 
0
0.34
2
Authors
2
Name
Order
Citations
PageRank
gashaw tsegaye100.34
Solomon Atnafu26612.13