Taming Big Data: An Information Extraction Strategy for Large Clinical Text Corpora. - Citegraph

Paper Info

Title
Taming Big Data: An Information Extraction Strategy for Large Clinical Text Corpora.

Abstract
Concepts of interest for clinical and research purposes are not uniformly distributed in clinical text available in electronic medical records. The purpose of our study was to identify filtering techniques to select 'high yield' documents for increased efficacy and throughput. Using two large corpora of clinical text, we demonstrate the identification of 'high yield' document sets in two unrelated domains: homelessness and indwelling urinary catheters. For homelessness, the high yield set includes homeless program and social work notes. For urinary catheters, concepts were more prevalent in notes from hospitalized patients; nursing notes accounted for a majority of the high yield set. This filtering will enable customization and refining of information extraction pipelines to facilitate extraction of relevant concepts for clinical decision support and other uses.

Year	DOI	Venue
2015	10.3233/978-1-61499-538-8-175	Studies in Health Technology and Informatics
Keywords	Field	DocType
Big data,natural language processing,information extraction	Information retrieval,Computer science,Text corpus,Information extraction,Artificial intelligence,Natural language processing,Big data	Conference
Volume	ISSN	Citations
213	0926-9630	0
PageRank	References	Authors
0.34	1	7

Authors (7 rows)

Cited by (0 rows)

References (1 rows)

Name	Order	Citations	PageRank
Adi Gundlapalli	1	47	14.74
Guy Divita	2	6	5.48
Marjorie Carter	3	8	5.52
Andrew Redd	4	11	6.59
Matthew H. Samore	5	143	26.07
kalpana gupta	6	0	0.68
Barbara W. Trautner	7	3	2.06

1