Title
Tools for syntactic concordancing
Abstract
Concordancers are tools that display the immediate context for the occurrences of a given word in a corpus. Also called KWIC - Key Word in Context tools, they are essential in the work of lexicographers, corpus linguists, and translators alike. We present an enhanced type of concordancer, which relies on a syntactic parser and on statistical association measures in order to detect those words in the context that are syntactically related to the sought word and are the most relevant for it, because together they may participate in multi-word expressions (MWEs). Our syntax-based concordancer highlights the MWEs in a corpus, groups them into syntactically-homogeneous classes (e.g., verb-object, adjective-noun), ranks MWEs according to the strength of association with the given word, and for each MWE occurrence displays the whole source sentence as a context. In addition, parallel sentence alignment and MWE translation techniques are used to display the translation of the source sentence in another language, and to automatically find a translation for the identified MWEs. The tool also offers functionalities for building a MWE database, and is available both off-line and online for a number languages (among which English, French, Spanish, Italian, German, Greek and Romanian).
Year
DOI
Venue
2010
10.1109/IMCSIT.2010.5679742
Computer Science and Information Technology
Keywords
Field
DocType
natural language processing,word processing,KWIC,Key Word in Context tools,MWE translation techniques,multiword expressions,parallel sentence alignment,statistical association measures,syntactic concordancing,syntactic parser,syntactically homogeneous classes,syntax based concordancer
Concordancer,Computer science,Grammar,Natural language processing,Artificial intelligence,Parsing,Key Word in Context,Syntax,Linguistics,Sentence,Word processing,German
Conference
ISSN
ISBN
Citations 
2157-5525
978-1-4244-6432-6
0
PageRank 
References 
Authors
0.34
10
2
Name
Order
Citations
PageRank
Violeta Seretan100.34
Eric Wehrli2305.35