Title
The importance of the lexicon in tagging biological text
Abstract
A part-of-speech tagger is a fundamental and indispensable tool in computational linguistics, typically employed at the critical early stages of processing. Although taggers are widely available that achieve high accuracy in very general domains, these do not perform nearly as well when applied to novel specialized domains, and this is especially true with biological text. We present a stochastic tagger that achieves over 97.44% accuracy on MEDLINE abstracts. A primary component of the tagger is its lexicon which enumerates the permitted parts-of-speech for the 10000 words most frequently occurring in MEDLINE. We present evidence for the conclusion that the lexicon is as vital to tagger accuracy as a training corpus, and more important than previously thought.
Year
DOI
Venue
2006
10.1017/S1351324905003967
Natural Language Engineering
Field
DocType
Volume
Trigram tagger,Computer science,Computational linguistics,Speech recognition,Lexicon,Natural language processing,Artificial intelligence,MEDLINE
Journal
12
Issue
Citations 
PageRank 
4
6
0.45
References 
Authors
11
3
Name
Order
Citations
PageRank
Lawrence H. Smith119614.48
Thomas C Rindflesch21620147.18
W. John Wilbur317429.05