Abstract | ||
---|---|---|
One important type of information contained in biomedical research literature is the newly discovered relationships between phenotypes and genotypes. Because of the large quantity of literature, a reliable automatic system to identify this information for future curation is essential. Such a system provides important and up to date data for database construction and updating, and even text summarization. In this paper we present a machine learning method to identify these genotype-phenotype relationships. No large human-annotated corpus of genotype-phenotype relationships currently exists. So, a semi-automatic approach has been used to annotate a small labelled training set and a self-training method is proposed to annotate more sentences and enlarge the training set.The resulting machine-learned model was evaluated using a separate test set annotated by an expert. The results show that using only the small training set in a supervised learning method achieves good results (precision: 76.47, recall: 77.61, F-measure: 77.03) which are improved by applying a self-training method (precision: 77.70, recall: 77.84, F-measure: 77.77).Relationships between genotypes and phenotypes is biomedical information pivotal to the understanding of a patient's situation. Our proposed method is the first attempt to make a specialized system to identify genotype-phenotype relationships in biomedical literature. We achieve good results using a small training set. To improve the results other linguistic contexts need to be explored and an appropriately enlarged training set is required. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1186/s13326-017-0163-8 | J. Biomedical Semantics |
Keywords | Field | DocType |
Computational linguistics,Genotype-phenotype relationship,Genotypes,Phenotypes,Self-training,Semi-automatic corpus annotation | Data science,Training set,Automatic summarization,Database construction,Computer science,Computational linguistics,Natural language processing,Artificial intelligence,Self training | Journal |
Volume | Issue | ISSN |
8 | 1 | 2041-1480 |
Citations | PageRank | References |
2 | 0.39 | 37 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Maryam Khordad | 1 | 10 | 1.64 |
Robert E. Mercer | 2 | 254 | 46.93 |