Comparing and combining a semantic tagger and a statistical tool for MWE extraction - Citegraph

Paper Info

Title
Comparing and combining a semantic tagger and a statistical tool for MWE extraction

Abstract
Automatic extraction of multiword expressions (MWEs) presents a tough challenge for the NLP community and corpus linguistics. Indeed, although numerous knowledge-based symbolic approaches and statistically driven algorithms have been proposed, efficient MWE extraction still remains an unsolved issue. In this paper, we evaluate the Lancaster UCREL Semantic Analysis System (henceforth USAS (Rayson, P., Archer, D., Piao, S., McEnery, T., 2004. The UCREL semantic analysis system. In: Proceedings of the LREC-04 Workshop, Beyond Named Entity Recognition Semantic labelling for NLP tasks, Lisbon, Portugal. pp. 7-12)) for MWE extraction, and explore the possibility of improving USAS by incorporating a statistical algorithm. Developed at Lancaster University, the USAS system automatically annotates English corpora with semantic category information. Employing a large-scale semantically classified multi-word expression template database, the system is also capable of detecting many multiword expressions, as well as assigning semantic field information to the MWEs extracted. Whilst USAS therefore offers a unique tool for MWE extraction, allowing us to both extract and semantically classify MWEs, it can sometimes suffer from low recall. Consequently, we have been comparing USAS, which employs a symbolic approach, to a statistical tool, which is based on collocational information, in order to determine the pros and cons of these different tools, and more importantly, to examine the possibility of improving MWE extraction by combining them. As we report in this paper, we have found a highly complementary relation between the different tools: USAS missed many domain-specific MWEs (law/court terms in this case), and the statistical tool missed many commonly used MWEs that occur in low frequencies (lower than three in this case). Due to their complementary relation, we are proposing that MWE coverage can be significantly increased by combining a lexicon-based symbolic approach and a collocation-based statistical approach.

Year	DOI	Venue
2005	10.1016/j.csl.2004.11.002	Computer Speech & Language
Keywords	Field	DocType
usas system,multiword expression,mwe extraction,semantic tagger,mwe coverage,different tool,henceforth usas,automatic extraction,statistical tool,whilst usas,complementary relation,low frequency,knowledge base	Expression (mathematics),Computer science,Natural language processing,Corpus linguistics,Artificial intelligence,Semantic field,Collocation,Computational linguistics,Speech recognition,Statistical algorithm,Lexicon,Named-entity recognition,Machine learning	Journal
Volume	Issue	ISSN
19	4	Computer Speech & Language
Citations	PageRank	References
24	1.33	12
Authors
4

Authors (4 rows)

Cited by (24 rows)

References (12 rows)

Name	Order	Citations	PageRank
Scott S. L. Piao	1	93	12.65
Paul Rayson	2	538	54.59
Dawn Archer	3	32	3.31
Tony Mcenery	4	53	8.87

1