Title
Clusterisation du Web en vue d'extraction de corpus homogènes
Abstract
Web resources are more and more different, not only regarding thematic content but also related to type of document, geographic origin, level, language, etc. However, web search engines do not take into account this heterogeneity and propose only a thematic access by keywords to the documents. This paper presents a method allowing to extract homogenous corpus of web documents. This method based on link analysis uses co-citation method and focuses more specially on the notion of type of web documents.
Year
Venue
Keywords
2002
INFORSID
entropie. keywords : co-citation method,mots-cles : méthode des co-citations,genre of web document,link analysis,entropy.,graphe web,typologie des pages,entropy,entropie
Field
DocType
Citations 
Homogenes,Humanities,Art,Art history
Conference
2
PageRank 
References 
Authors
0.46
15
3
Name
Order
Citations
PageRank
Camille Prime-Claverie1132.05
Michel Beigbeder27223.49
T. Lafouge36810.97