Title
Extracting subject demographic information from abstracts of randomized clinical trial reports.
Abstract
In order to make more informed healthcare decisions, consumers need information systems that deliver accurate and reliable information about their illnesses and potential treatments. Reports of randomized clinical trials (RCTs) provide reliable medical evidence about the efficacy of treatments. Current methods to access, search for, and retrieve RCTs are keyword-based, time-consuming, and suffer from poor precision. Personalized semantic search and medical evidence summarization aim to solve this problem. The performance of these approaches may improve if they have access to study subject descriptors (e.g. age, gender, and ethnicity), trial sizes, and diseases/symptoms studied. We have developed a novel method to automatically extract such subject demographic information from RCT abstracts. We used text classification augmented with a Hidden Morkov Model to identify, sentences containing subject demographics, and subsequently these sentences were parsed rising Natural Language Processing techniques to extract relevant information. Our results show accuracy levels of 82.5%, 92.5%, and 92.0% for extraction of subject descriptors, trial sizes, and diseases/symptoms descriptors respectively.
Year
DOI
Venue
2007
10.3233/978-1-58603-774-1-550
Studies in Health Technology and Informatics
Keywords
Field
DocType
randomized clinical trial,information extraction,subject demographics,semantics,statistical NLP,text analysis
Information system,Automatic summarization,Data mining,Information retrieval,Semantic search,Randomized controlled trial,Information extraction,Hidden Markov model,Study Subject,Medicine,Semantics
Conference
Volume
Issue
ISSN
129
Pt 1
0926-9630
Citations 
PageRank 
References 
11
0.77
1
Authors
6
Name
Order
Citations
PageRank
Rong Xu1322.21
Yael Garten21838.73
Kaustubh S. Supekar3414.26
Amar K. Das442051.09
Russ B. Altman52500456.07
Alan M. Garber6333.31