Title | ||
---|---|---|
Automatic de-identification of electronic medical records using token-level and character-level conditional random fields |
Abstract | ||
---|---|---|
De-identification, identifying and removing all protected health information (PHI) present in clinical data including electronic medical records (EMRs), is a critical step in making clinical data publicly available. The 2014 i2b2 (Center of Informatics for Integrating Biology and Bedside) clinical natural language processing (NLP) challenge sets up a track for de-identification (track 1). In this study, we propose a hybrid system based on both machine learning and rule approaches for the de-identification track. In our system, PHI instances are first identified by two (token-level and character-level) conditional random fields (CRFs) and a rule-based classifier, and then are merged by some rules. Experiments conducted on the i2b2 corpus show that our system submitted for the challenge achieves the highest micro F-scores of 94.64%, 91.24% and 91.63% under the 'token', 'strict' and 'relaxed' criteria respectively, which is among top-ranked systems of the 2014 i2b2 challenge. After integrating some refined localization dictionaries, our system is further improved with F-scores of 94.83%, 91.57% and 91.95% under the 'token', 'strict' and 'relaxed' criteria respectively. © 2015 Elsevier Inc.. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1016/j.jbi.2015.06.009 | Journal of Biomedical Informatics |
Keywords | Field | DocType |
De-identification,Electronic medical records,Hybrid method,Natural language processing,Protected health information,i2b2 | Conditional random field,Informatics,Data mining,De-identification,Computer science,Protected health information,Artificial intelligence,Natural language processing,Classifier (linguistics),Hybrid system,Security token,CRFS | Journal |
Volume | Issue | ISSN |
58 | SUPnan | 1532-0464 |
Citations | PageRank | References |
10 | 0.57 | 18 |
Authors | ||
9 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zengjian Liu | 1 | 35 | 3.84 |
Chen Yangxin | 2 | 16 | 1.01 |
Buzhou Tang | 3 | 368 | 34.04 |
Xiaolong Wang | 4 | 1208 | 115.39 |
Qingcai Chen | 5 | 809 | 66.72 |
Li Haodi | 6 | 25 | 3.89 |
Wang Jingfeng | 7 | 16 | 1.01 |
qiwen | 8 | 20 | 2.10 |
Zhu Suisong | 9 | 16 | 1.34 |