Abstract | ||
---|---|---|
Given the large heterogeneity of the World Wide Web, using metadata on the search engines side seems to be a useful track for information retrieval. Though, because a manual qualification at the Web scale is not accessible, this track is little followed. We propose a semi-automatic method for propagating metadata. In a first step, homegeneous corpus are extracted. We used in our study the following properties: the authority type, the site type, the information type, and the page type. This first step is realized by a clusterization which uses a similarity measure based on the co-citation frequency between pages. Given the cluster hierarchy, the second step selects a reduced number of documents to be manually qualified and propagates the given metadata values to the other documents belonging to the same cluster. A qualitative evaluation and a preliminary study about the scalability of this method are presented. |
Year | DOI | Venue |
---|---|---|
2005 | 10.1109/WI.2005.95 | Web Intelligence |
Keywords | Field | DocType |
information type,metadata propagation,information retrieval,page type,metadata value,propagating metadata,site type,web scale,cluster hierarchy,authority type,world wide web,citation analysis,transcoding,multimedia,meta data,search engine,internet,search engines | Data mining,Metadata,Metadata repository,World Wide Web,Search engine,Similarity measure,Information retrieval,Computer science,Citation analysis,Synonym ring,The Internet,Scalability | Conference |
ISBN | Citations | PageRank |
0-7695-2415-X | 3 | 0.53 |
References | Authors | |
5 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Camille Prime-Claverie | 1 | 13 | 2.05 |
Michel Beigbeder | 2 | 72 | 23.49 |
T. Lafouge | 3 | 68 | 10.97 |