Title | ||
---|---|---|
Herculb: Content-Based Information Extraction And Retrieval For Cultural Heritage Of The Balkans |
Abstract | ||
---|---|---|
Purpose - The purpose of this paper is to provide a methodology for automatic annotation of a multimedia collection of intangible cultural heritage mostly in the form of interviews. Assigned annotations provide a way to search the collection.Design/methodology/approach - Annotation is based on automatic extraction of metadata and is conducted by named entity and topic extraction from textual descriptions with a rule-based approach supported by vocabulary resources, a compiled domain-specific classification scheme and domain-oriented corpus analysis.Findings - The proposed methodology for automatic annotation of a collection of intangible cultural heritage, applied on the cultural heritage of the Balkans, has very good results according to F measure, which is 0.87 for the named entity and 0.90 for topic annotation. The overall methodology enables encapsulating domain-specific and language-specific knowledge into collections of finite state transducers and allows further improvements.Originality/value - Although cultural heritage has a significant role in the development of identity of a group or an individual, it is one of those specific domains that have not yet been fully explored in case of many languages. A methodology is proposed that can be used for incorporating natural language processing techniques into digital libraries of cultural heritage. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1108/EL-03-2020-0052 | ELECTRONIC LIBRARY |
Keywords | DocType | Volume |
Information extraction, Content-based search, Natural language processing, Intangible cultural heritage | Journal | 38 |
Issue | ISSN | Citations |
5-6 | 0264-0473 | 0 |
PageRank | References | Authors |
0.34 | 0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ivana Tanasijevic | 1 | 0 | 0.34 |
Gordana Pavlovic-Lazetic | 2 | 35 | 7.82 |