Title
Web page classification based on Schema.org collection
Abstract
The internet is a library of a huge amount of information and there is a need for categorize its content based on web page classification. Classification of web page content can improve the quality of web search and its accuracy. Unfortunately the high dimensionality of the web pages dataset has made the process of classification difficult. The use of an automatic method for web page classification can simplify the whole process and assist the search engine in getting more relevant results. Nowadays information on the web is generally structured and formatted in a not formal way. This absence of semantics leads to create formal methods to provide more semantics information into web page. Search engines including Bing, Google, Yahoo! and Yandex formed collection of schemas Schema.org to support web page semantics and improve their search results. This paper explores the use of formal source code structure for classifying a large collection of the web content. Is focused on use of schemas collection Schema.org to classify web pages and categorize them unambiguously.
Year
DOI
Venue
2012
10.1109/CASoN.2012.6412428
Computational Aspects of Social Networks
Keywords
Field
DocType
Internet,Web sites,pattern classification,search engines,search problems,source coding,Bing,Google,Internet,Web page content classification,Web page dataset dimensionality,Web page semantics,Web search quality,Yahoo!,Yandex,content categorization,formal methods,formal source code structure,information library,schema.org collection,search engine,semantics information,Collection of schemas Schema.org,Genres,Microformats,Microgenres,Web Page Clasification
Web search engine,Web development,Static web page,Data mining,World Wide Web,Information retrieval,Web page,Computer science,Web modeling,Backlink,Page view,Web crawler
Conference
ISSN
ISBN
Citations 
2155-7047
978-1-4673-4793-8
3
PageRank 
References 
Authors
0.41
4
3
Name
Order
Citations
PageRank
Jonas Krutil130.41
Milos Kudelka211623.81
Václav Snasel31261210.53