Title
Content independent metadata production as a machine learning problem
Abstract
Metadata provide a high-level description of digital library resources and represent the key to enable the discovery and selection of suitable resources. However the growth in size and diversity of digital collections makes manual metadata extraction an expensive task. This paper proposes a new content independent method to automatically generate metadata in order to characterize resources in a given learning objects repository. The key idea is to rely on few existing metadata to learn predictive models of metadata values. The proposed method is content independent and handles resources in different formats: text, image, video, Java applet, etc. Two classical machine learning approaches are studied in this paper: in the first approach a supervised machine learning technique classify each value of a metadata field to be predicted according to the other a-priori filled metadata fields. The second approach used the FP-Growth algorithm to discover relationships between the different metadata fields as association rules. Experiments on two well-known educational data repositories show that both approaches can enhance metadata extraction and can even fill subjective metadata fields that are difficult to extract from the content of a resource, such as the difficulty of a resource.
Year
DOI
Venue
2012
10.1007/978-3-642-31537-4_24
MLDM
Keywords
Field
DocType
classical machine,different format,new content independent method,metadata field,metadata extraction,metadata value,manual metadata extraction,different metadata field,content independent metadata production,subjective metadata field,existing metadata,machine learning,association rules
Metadata repository,Metadata,Information retrieval,Geospatial metadata,Meta Data Services,Data element,Computer science,Marker interface pattern,Artificial intelligence,Metadata modeling,Machine learning,Database catalog
Conference
Citations 
PageRank 
References 
1
0.35
17
Authors
2
Name
Order
Citations
PageRank
Sahar Changuel1262.76
Nicolas Labroche213917.87