Abstract | ||
---|---|---|
This article describes the use of pushdown automata (PDA) in the context of statistical machine translation and alignment under a synchronous context-free grammar. We use PDAs to compactly represent the space of candidate translations generated by the grammar when applied to an input sentence. General-purpose PDA algorithms for replacement, composition, shortest path, and expansion are presented. We describe HiPDT, a hierarchical phrase-based decoder using the PDA representation and these algorithms. We contrast the complexity of this decoder with a decoder based on a finite state automata representation, showing that PDAs provide a more suitable framework to achieve exact decoding for larger synchronous context-free grammars and smaller language models. We assess this experimentally on a large-scale Chinese-to-English alignment and translation task. In translation, we propose a two-pass decoding strategy involving a weaker language model in the first-pass to address the results of PDA complexity analysis. We study in depth the experimental conditions and tradeoffs in which HiPDT can achieve state-of-the-art performance for large-scale SMT. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1162/COLI_a_00197 | Computational Linguistics |
Keywords | Field | DocType |
algorithms,design,automata,experimentation,measurement,languages,linguistics,performance,machine translation | Rule-based machine translation,Computer science,Machine translation,Phrase,Synchronous context-free grammar,Theoretical computer science,Natural language processing,Artificial intelligence,Language model,Algorithm,Finite-state machine,Pushdown automaton,Decoding methods | Journal |
Volume | Issue | ISSN |
40 | 3 | 0891-2017 |
Citations | PageRank | References |
8 | 0.53 | 42 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Cyril Allauzen | 1 | 690 | 47.64 |
William Byrne | 2 | 145 | 20.22 |
Adrià de Gispert | 3 | 472 | 35.22 |
Gonzalo Iglesias | 4 | 134 | 8.19 |
Michael Riley | 5 | 102 | 7.13 |