Title
Cited text spans identification with an improved balanced ensemble model.
Abstract
Scientific summarization aims to provide condensed summary of important contributions of scientific papers. This problem has been extensively explored and recent interest has been aroused to taking advantage of the cited text spans to generate summaries. Cited text spans are the texts in the cited paper that most accurately reflect the citation. They can be viewed as important aspects of the cited paper which are annotated by academic community. Hence, identifying cited text spans is of vital importance for providing a different scientific summarization. In this paper, we explore three potential improvements towards our previous work which is a two-layer ensemble model to tackle the cited text spans identification problem. We first view cited text spans identification as an imbalanced classification problem and carry out comparison on preprocessing methods to handle the imbalanced dataset. Then we propose RANdom Sampling Aggregating (RANSA) algorithm to train classifiers in the first ensemble layer model. Finally, an improved stacking framework Hybrid-Stacking is applied to combine the models of the first layer. Our new ensemble model overcomes flaws of the previous work, and shows improved performance on cited text spans identification.
Year
DOI
Venue
2019
10.1007/s11192-019-03167-z
Scientometrics
Keywords
Field
DocType
Scientific summarization, Cited text spans, Ensemble, Stacking
Automatic summarization,Data mining,Ensemble forecasting,Information retrieval,Computer science,Citation,Preprocessor,Sampling (statistics),Academic community,Parameter identification problem
Journal
Volume
Issue
ISSN
120
3
0138-9130
Citations 
PageRank 
References 
0
0.34
0
Authors
5
Name
Order
Citations
PageRank
Pancheng Wang111.71
Shasha Li28520.31
Haifang Zhou3359.33
Jintao Tang48914.00
Ting Wang5369.43