Title | ||
---|---|---|
Fast max-margin clustering for unsupervised word sense disambiguation in biomedical texts. |
Abstract | ||
---|---|---|
We aim to solve the problem of determining word senses for ambiguous biomedical terms with minimal human effort.We build a fully automated system for Word Sense Disambiguation by designing a system that does not require manually-constructed external resources or manually-labeled training examples except for a single ambiguous word. The system uses a novel and efficient graph-based algorithm to cluster words into groups that have the same meaning. Our algorithm follows the principle of finding a maximum margin between clusters, determining a split of the data that maximizes the minimum distance between pairs of data points belonging to two different clusters.On a test set of 21 ambiguous keywords from PubMed abstracts, our system has an average accuracy of 78%, outperforming a state-of-the-art unsupervised system by 2% and a baseline technique by 23%. On a standard data set from the National Library of Medicine, our system outperforms the baseline by 6% and comes within 5% of the accuracy of a supervised system.Our system is a novel, state-of-the-art technique for efficiently finding word sense clusters, and does not require training data or human effort for each new word to be disambiguated. |
Year | DOI | Venue |
---|---|---|
2009 | 10.1186/1471-2105-10-S3-S4 | BMC Bioinformatics |
Keywords | Field | DocType |
bioinformatics,computational biology,microarrays,cluster analysis,algorithms | SemEval,Information retrieval,Computer science,Natural language processing,Artificial intelligence,Bioinformatics,Word sense,Cluster analysis,Word-sense disambiguation | Journal |
Volume | Issue | ISSN |
10 Suppl 3 | S-3 | 1471-2105 |
Citations | PageRank | References |
23 | 0.48 | 29 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Weisi Duan | 1 | 33 | 2.02 |
Min Song | 2 | 108 | 7.23 |
Alexander Yates | 3 | 898 | 51.53 |