Title | ||
---|---|---|
A cooperative crowdsourcing framework for knowledge extraction in digital humanities - cases on Tang poetry. |
Abstract | ||
---|---|---|
Purpose The purpose of this paper is to propose a knowledge extraction framework to extract knowledge, including entities and relationships between them, from unstructured texts in digital humanities (DH). Design/methodology/approach The proposed cooperative crowdsourcing framework (CCF) uses both human-computer cooperation and crowdsourcing to achieve high-quality and scalable knowledge extraction. CCF integrates active learning with a novel category-based crowdsourcing mechanism to facilitate domain experts labeling and verifying extracted knowledge. Findings The case study shows that CCF can effectively and efficiently extract knowledge from multi-sourced heterogeneous data in the field of Tang poetry. Specifically, CCF achieves higher accuracy of knowledge extraction than the state-of-the-art methods, the contribution of feedbacks to the training model can be maximized by the active learning mechanism and the proposed category-based crowdsourcing mechanism can scale up the effective human-computer collaboration by considering the specialization of workers in different categories of tasks. Research limitations/implications - This research proposes CCF to enable high-quality and scalable knowledge extraction in the field of Tang poetry. CCF can be generalized to other fields of DH by introducing domain knowledge and experts. Practical implications - The extracted knowledge is machine-understandable and can support the research of Tang poetry and knowledge-driven intelligent applications in DH. Originality/value CCF is the first human-in-the-loop knowledge extraction framework that integrates active learning and crowdsourcing mechanisms; he human-computer cooperation method uses the feedback of domain experts through the active learning mechanism; the category-based crowdsourcing mechanism considers the matching of categories of DH data and especially of domain experts. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1108/AJIM-07-2019-0192 | ASLIB JOURNAL OF INFORMATION MANAGEMENT |
Keywords | DocType | Volume |
Crowdsourcing,Human-computer cooperation,Knowledge extraction,Digital humanities,Tang poetry | Journal | 72.0 |
Issue | ISSN | Citations |
SP2.0 | 2050-3806 | 0 |
PageRank | References | Authors |
0.34 | 0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Liang Hong | 1 | 193 | 33.79 |
Wenjun Hou | 2 | 0 | 0.68 |
Zonghui Wu | 3 | 0 | 0.34 |
Huijie Han | 4 | 0 | 0.34 |