Segmenting Scientific Abstracts into Discourse Categories: A Deep Learning-Based Approach for Sparse Labeled Data - Citegraph

Paper Info

Title
Segmenting Scientific Abstracts into Discourse Categories: A Deep Learning-Based Approach for Sparse Labeled Data

Abstract
The abstract of a scientific paper distills the contents of the paper into a short paragraph. In the biomedical literature, it is customary to structure an abstract into discourse categories like BACKGROUND, OBJECTIVE, METHOD, RESULT, and CONCLUSION, but this segmentation is uncommon in other fields like computer science. Explicit categories could be helpful for more granular, that is, discourse-level search and recommendation. The sparsity of labeled data makes it challenging to construct supervised machine learning solutions for automatic discourse-level segmentation of abstracts in non-bio domains. In this paper, we address this problem using transfer learning. We define three discourse categories -- BACKGROUND, TECHNIQUE, and OBSERVATION -- for an abstract because these three categories are most common. We train a deep neural network on structured abstracts from PubMed, then fine-tune it on a small hand-labeled corpus of computer science papers. We observe an accuracy of 75% on the test corpus of computer science papers. We also perform an ablation study to highlight the roles of the different parts of the model. Our method appears to be a promising solution to the automatic segmentation of abstracts, where the labeled data is sparse.

Year	DOI	Venue
2020	10.1145/3383583.3398598	JCDL '20: The ACM/IEEE Joint Conference on Digital Libraries in 2020 Virtual Event China August, 2020
DocType	ISBN	Citations
Conference	978-1-4503-7585-6	1
PageRank	References	Authors
0.35	3	5

Authors (5 rows)

Cited by (1 rows)

References (3 rows)

Name	Order	Citations	PageRank
Soumya Jyoti Banerjee	1	5	3.95
Debarshi Kumar Sanyal	2	36	10.62
Samiran Chattopadhyay	3	174	34.02
Plaban Kumar Bhowmick	4	20	8.62
Das Parthapratim	5	1	0.35

1