Title
Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov.
Abstract
Objective: Clinical trials investigating drugs that target specific genetic alterations in tumors are important for promoting personalized cancer therapy. The goal of this project is to create a knowledge base of cancer treatment trials with annotations about genetic alterations from ClinicalTrials.gov. Methods: We developed a semi-automatic framework that combines advanced text-processing techniques with manual review to curate genetic alteration information in cancer trials. The framework consists of a document classification system to identify cancer treatment trials from ClinicalTrials.gov and an information extraction system to extract gene and alteration pairs from the Title and Eligibility Criteria sections of clinical trials. By applying the framework to trials at ClinicalTrials.gov, we created a knowledge base of cancer treatment trials with genetic alteration annotations. We then evaluated each component of the framework against manually reviewed sets of clinical trials and generated descriptive statistics of the knowledge base. Results and Discussion: The automated cancer treatment trial identification system achieved a high precision of 0.9944. Together with the manual review process, it identified 20 193 cancer treatment trials from ClinicalTrials.gov. The automated gene-alteration extraction system achieved a precision of 0.8300 and a recall of 0.6803. After validation by manual review, we generated a knowledge base of 2024 cancer trials that are labeled with specific genetic alteration information. Analysis of the knowledge base revealed the trend of increased use of targeted therapy for cancer, as well as top frequent gene-alteration pairs of interest. We expect this knowledge base to be a valuable resource for physicians and patients who are seeking information about personalized cancer therapy.
Year
DOI
Venue
2016
10.1093/jamia/ocw009
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION
Keywords
Field
DocType
personalized cancer therapy,natural language processing,clinical trial
Document classification,Data mining,Precision medicine,Targeted therapy,Clinical trial,Information extraction,Knowledge base,Cancer Treatment Trial,Medicine,Cancer
Journal
Volume
Issue
ISSN
23
4
1067-5027
Citations 
PageRank 
References 
0
0.34
9
Authors
13
Name
Order
Citations
PageRank
Jun Xu140.78
Hee-Jin Lee2203.94
Jia Zeng300.68
Yonghui Wu410.71
Yaoyun Zhang59416.58
Liang-Chin Huang600.68
Amber Johnson700.34
Vijaykumar Holla800.68
Ann M Bailey900.68
Trevor Cohen1057953.11
Funda Meric-Bernstam1144.80
Elmer V Bernstam1251.18
Hua Xu13773.27