Title
A cooperative crowdsourcing framework for knowledge extraction in digital humanities - cases on Tang poetry.
Abstract
Purpose The purpose of this paper is to propose a knowledge extraction framework to extract knowledge, including entities and relationships between them, from unstructured texts in digital humanities (DH). Design/methodology/approach The proposed cooperative crowdsourcing framework (CCF) uses both human-computer cooperation and crowdsourcing to achieve high-quality and scalable knowledge extraction. CCF integrates active learning with a novel category-based crowdsourcing mechanism to facilitate domain experts labeling and verifying extracted knowledge. Findings The case study shows that CCF can effectively and efficiently extract knowledge from multi-sourced heterogeneous data in the field of Tang poetry. Specifically, CCF achieves higher accuracy of knowledge extraction than the state-of-the-art methods, the contribution of feedbacks to the training model can be maximized by the active learning mechanism and the proposed category-based crowdsourcing mechanism can scale up the effective human-computer collaboration by considering the specialization of workers in different categories of tasks. Research limitations/implications - This research proposes CCF to enable high-quality and scalable knowledge extraction in the field of Tang poetry. CCF can be generalized to other fields of DH by introducing domain knowledge and experts. Practical implications - The extracted knowledge is machine-understandable and can support the research of Tang poetry and knowledge-driven intelligent applications in DH. Originality/value CCF is the first human-in-the-loop knowledge extraction framework that integrates active learning and crowdsourcing mechanisms; he human-computer cooperation method uses the feedback of domain experts through the active learning mechanism; the category-based crowdsourcing mechanism considers the matching of categories of DH data and especially of domain experts.
Year
DOI
Venue
2020
10.1108/AJIM-07-2019-0192
ASLIB JOURNAL OF INFORMATION MANAGEMENT
Keywords
DocType
Volume
Crowdsourcing,Human-computer cooperation,Knowledge extraction,Digital humanities,Tang poetry
Journal
72.0
Issue
ISSN
Citations 
SP2.0
2050-3806
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Liang Hong119333.79
Wenjun Hou200.68
Zonghui Wu300.34
Huijie Han400.34