Title
A two-stage workflow to extract and harmonize drug mentions from clinical notes into observational databases
Abstract
AbstractGraphical abstractDisplay OmittedHighlights •Methodology to extract concepts from clinical notes to enrich OMOP CDM databases.•Open-source and prepared to be integrated in migration pipelines of EHR databases.•Semi-automatically harmonises and validates medical concepts in clinical notes. AbstractBackgroundThe content of the clinical notes that have been continuously collected along patients’ health history has the potential to provide relevant information about treatments and diseases, and to increase the value of structured data available in Electronic Health Records (EHR) databases. EHR databases are currently being used in observational studies which lead to important findings in medical and biomedical sciences. However, the information present in clinical notes is not being used in those studies, since the computational analysis of this unstructured data is much complex in comparison to structured data.MethodsWe propose a two-stage workflow for solving an existing gap in Extraction, Transformation and Loading (ETL) procedures regarding observational databases. The first stage of the workflow extracts prescriptions present in patient’s clinical notes, while the second stage harmonises the extracted information into their standard definition and stores the resulting information in a common database schema used in observational studies.ResultsWe validated this methodology using two distinct data sets, in which the goal was to extract and store drug related information in a new Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) database. We analysed the performance of the used annotator as well as its limitations. Finally, we described some practical examples of how users can explore these datasets once migrated to OMOP CDM databases.ConclusionWith this methodology, we were able to show a strategy for using the information extracted from the clinical notes in business intelligence tools, or for other applications such as data exploration through the use of SQL queries. Besides, the extracted information complements the data present in OMOP CDM databases which was not directly available in the EHR database.
Year
DOI
Venue
2021
10.1016/j.jbi.2021.103849
Periodicals
Keywords
DocType
Volume
EHR, Clinical NLP, Clinical notes, Information extraction, Observational studies, OMOP CDM, ETL
Journal
120
Issue
ISSN
Citations 
C
1532-0464
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
João Rafael Almeida132.46
João Figueira Silva212.03
Sérgio Matos341529.51
José Luis Oliveira476084.03