Structuralizing biomedical abstracts with discriminative linguistic features. - Citegraph

Paper Info

Title
Structuralizing biomedical abstracts with discriminative linguistic features.

Abstract
ObjectiveNearly 75% of the abstracts in MEDLINE papers present in an unstructured format. This study aims to automate the reformatting of unstructured abstracts into the Introduction, Methods, Results, and Discussion (IMRAD) format. The quality of this reformatting relies on the features used in sentence classification. Therefore, we explored the most effective linguistic features in MEDLINE papers. MethodsWe constructed a feature set consisting of bag of words, linguistic features, grammatical features, and structural features. In order to evaluate the effectiveness, which is the capability of the sentence classification with the features, three datasets from PubMed Central Open Access Subset were selected and constructed: (1) structured abstract (SA) for training, (2) unstructured RCT abstract (UA-1) and (3) unstructured general abstract (UA-2). F-score and accuracy were used to measure the effectiveness on IMRAD section level and the overall classification. ResultsAdding linguistic features improves the classification of the abstract sentence from 1.2% to 35.8% in terms of accuracy in three abstract datasets. The highest accuracies achieved were 91.7% in SA, 86.3% in UA-1, and 77.9% in UA-2. Linguistic features (dimensions=15) had fewer dimensions than bag-of-words (dimensions= 1541). All representative linguistic features (n-gram and verb phrase, and noun phrase) for each section are identified in our system (available at http://abstract.bike.re.kr). ConclusionLinguistic features can be used to effectively classify sentence with low computation burden in MEDLINE abstract. A biomedical abstract reformatting method based on linguistic features is proposed.Representative linguistic features are identified for classification.A web application for structuralizing biomedical abstracts is provided.

Year	DOI	Venue
2016	10.1016/j.compbiomed.2016.10.026	Comp. in Bio. and Med.
Keywords	Field	DocType
Biomedical research paper,Discriminative linguistic features,IMRAD format,Sentence classification,Structured abstract	Bag-of-words model,Verb phrase,Computer science,Artificial intelligence,Natural language processing,Web application,MEDLINE,Discriminative model,Noun phrase,Information retrieval,IMRAD,Sentence,Linguistics	Journal
Volume	Issue	ISSN
79	C	0010-4825
Citations	PageRank	References
0	0.34	0
Authors
6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Sejin Nam	1	28	23.20
Senator Jeong	2	22	2.83
Sang-Kyun Kim	3	0	0.34
Hong-Gee Kim	4	104	18.80
Victoria Ngo	5	12	1.61
Nansu Zong	6	45	5.68

1