Title
Geoboost2: A Natural Languageprocessing Pipeline For Genbank Metadata Enrichment For Virus Phylogeography
Abstract
A Summary: We present GeoBoost2, a natural language-processing pipeline for extracting the location of infected hosts for enriching metadata in nucleotide sequences repositories like National Center of Biotechnology Information's GenBank for downstream analysis including phylogeography and genomic epidemiology. The increasing number of pathogen sequences requires complementary information extraction methods for focused research, including surveillance within countries and between borders. In this article, we describe the enhancements from our earlier release including improvement in end-to-end extraction performance and speed, availability of a fully functional web-interface and state-of-the-art methods for location extraction using deep learning.
Year
DOI
Venue
2020
10.1093/bioinformatics/btaa647
BIOINFORMATICS
DocType
Volume
Issue
Journal
36
20
ISSN
Citations 
PageRank 
1367-4803
0
0.34
References 
Authors
0
6
Name
Order
Citations
PageRank
Arjun Magge122.74
Davy Weissenbacher211.71
Karen O'Connor301.35
Tasnia Tahsin4303.28
Graciela Gonzalez-Hernandez525.10
Matthew Scotch612311.56