Statistical Machine Translation

Cambridge University

Publications and Presentations, from 2002

2017

  • Neural Machine Translation by Minimising the Bayes-risk with Respect to Syntactic Translation Lattices. EACL, 2017. https://arxiv.org/abs/1612.03791
    We present a novel scheme to combine neural machine translation (NMT) with traditional statistical machine translation (SMT). Our approach borrows ideas from linearised lattice minimum Bayes-risk decoding for SMT. The NMT score is combined with the Bayes-risk of the translation according the SMT lattice. This makes our approach much more flexible than n-best list or lattice rescoring as the neural decoder is not restricted to the SMT search space. We show an efficient and simple way to integrate risk estimation into the NMT decoder which is suitable for word-level as well as subword-unit-level NMT. We test our method on English-German and Japanese-English and report significant gains over lattice rescoring on several data sets for both single and ensembled NMT. The MBR decoder produces entirely new hypotheses far beyond simply rescoring the SMT search space or fixing UNKs in the NMT output.
  • Source sentence simplification for statistical machine translation. Computer, Speech, and Language. http://dx.doi.org/10.1016/j.csl.2016.12.001
    Long sentences with complex syntax and long-distance dependencies pose difficulties for machine translation systems. Short sentences, on the other hand, are usually easier to translate. We study the potential of addressing this mismatch using text simplification: given a simplified version of the full input sentence, can we use it in addition to the full input to improve translation? We show that the spaces of original and simplified translations can be effectively combined using translation lattices and compare two decoding approaches to process both inputs at different levels of integration. We demonstrate on source annotated portions of WMT test sets and on top of strong baseline systems combining hierarchical and neural translation for two language pairs that source simplification can help to improve translation quality.

2016

  • Syntactically Guided Neural Machine Translation. ACL, 2016. http://aclweb.org/anthology/P16-2049
    We investigate the use of hierarchical phrase-based SMT lattices in end-to-end neural machine translation (NMT). Weight pushing transforms the Hiero scores for complete translation hypotheses, with the full translation grammar score and full ngram language model score, into posteriors compatible with NMT predictive probabilities. With a slightly modified NMT beam-search decoder we find gains over both Hiero and NMT decoding alone, with practical advantages in extending NMT to very large input and output vocabularies.
  • The Edit Distance Transducer in Action: The University of Cambridge English-German System at WMT16. http://aclweb.org/anthology/W16-2324
    This paper presents the University of Cambridge submission to WMT16. Motivated by the complementary nature of syntactical machine translation and neural machine translation (NMT), we exploit the synergies of Hiero and NMT in different combination schemes. Starting out with a simple neural lattice rescoring approach, we show that the Hiero lattices are often too narrow for NMT ensembles. Therefore, instead of a hard restriction of the NMT search space to the lattice, we propose to loosely couple NMT and Hiero by composition with a modified version of the edit distance transducer. The loose combination outperforms lattice rescoring, especially when using multiple NMT systems in an ensemble.

2015

  • Transducer Disambiguation with Sparse Topological Features. EMNLP, 2015. http://www.aclweb.org/anthology/D/D15/D15-1273.pdf
    We describe a simple and efficient algorithm to disambiguate non-functional weighted finite state transducers (WFSTs), i.e. to generate a new WFST that contains a unique, best-scoring path for each hypothesis in the input labels along with the best output labels. The algorithm uses topological features combined with a tropical sparse tuple vector semiring. We empirically show that our algorithm is more efficient than previous work in a PoS-tagging disambiguation task. We use our method to rescore very large translation lattices with a bilingual neural network language model, obtaining gains in line with the literature.
  • The Geometry of Statistical Machine Translation. NAACL/HLT, 2015. http://www.aclweb.org/anthology/N/N15/N15-1041.pdf
    Most modern statistical machine translation systems are based on linear statistical models. One extremely effective method for estimating the model parameters is minimum error rate training (MERT), which is an efficient form of line optimisation adapted to the highly nonlinear objective functions used in machine translation. We describe a polynomial-time generalisation of line optimisation that computes the error surface over a plane embedded in parameter space. The description of this algorithm relies on convex geometry, which is the mathematics of polytopes and their faces. Using this geometric representation of MERT we investigate whether the optimisation of linear models is tractable in general. Previous work on finding optimal solutions in MERT (Galley and Quirk, 2011) established a worstcase complexity that was exponential in the number of sentences, in contrast we show that exponential dependence in the worst-case complexity is mainly in the number of features. Although our work is framed with respect to MERT, the convex geometric description is also applicable to other error-based training methods for linear models. We believe our analysis has important ramifications because it suggests that the current trend in building statistical machine translation systems by introducing a very large number of sparse features is inherently not robust.

2014

  • Effective Incorporation of Source Syntax into Hierarchical Phrase-based Translation. Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics. 2014. http://www.aclweb.org/anthology/C14-1195
    We explicitly consider source language syntactic information in both rule extraction and decoding for hierarchical phrase-based translation. We obtain tree-to-string rules by the GHKM method and use them to complement Hiero-style rules. All these rules are then employed to decode new sentences with source language parse trees. We experiment with our approach in a state-of-the-art Chinese-English system and demonstrate +1.2 and +0.8 BLEU improvements on the NIST newswire and web evaluation data of MT08 and MT12.
  • Word ordering with phrase-based grammars. In Proc of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL’14), 2014. Accepted. To appear. http://mi.eng.cam.ac.uk/~wjb31/ppubs/EACL2014.pdf.
    We describe an approach to word ordering using modelling techniques from statistical machine translation. The system incorporates a phrase-based model of string generation that aims to take unordered bags of words and produce fluent, grammatical sentences. We describe the generation grammars and introduce parsing procedures that address the computational complexity of generation under permutation of phrases. Against the best previous results reported on this task, obtained using syntax driven models, we report huge quality improvements, with BLEU score gains of 20+ which we confirm with human fluency judgements. Our system incorporates dependency language models, large n-gram language models, and minimum Bayes risk decoding.
  • C. Allauzen, W. Byrne, A. de Gispert, G. Iglesias, and M. Riley. Pushdown automata in statistical machine translation. Computational Linguistics, 2014. Accepted. To appear. http://mi.eng.cam.ac.uk/~wjb31/ppubs/cl2013.final.pdf.
    This paper describes the use of pushdown automata (PDA) in the context of statistical machine translation and alignment under a synchronous context-free grammar. We use PDAs to compactly represent the space of candidate translations generated by the grammar when applied to an input sentence. General-purpose PDA algorithms for replacement, composition, shortest path, and expansion are presented. We describe HiPDT, a hierarchical phrase-based decoder using the PDA representation and these algorithms. We contrast the complexity of this decoder with a decoder based on a finite state automata (FSA) representation, showing that PDAs provide a more suitable framework to achieve exact decoding for larger SCFGs and smaller language models. We assess this experimentally on a large-scale Chinese-to-English alignment and translation task. In translation, we propose a two-pass decoding strategy involving a weaker language model in the first-pass to address the results of PDA complexity analysis. We study in depth the experimental conditions and tradeoffs in which HiPDT can achieve state-of- the-art performance for large-scale SMT.

2013

  • W. Byrne. Pushdown automata in statistical machine translation. International Conference on Finite-State Methods and Natural Language Processing, FSMNLP, 2013. Keynote lecture. http://fsmnlp2013.cs.st-andrews.ac.uk/abstracts.html#byrne.
    This talk will present some recent work investigating pushdown automata (PDA) in the context of statistical machine translation and alignment under synchronous context-free grammars (SCFGs). PDAs can be used to compactly represent the space of candidate translations generated by the grammar when applied to an input sentence, and this presentation will give an overview of general-purpose PDA algorithms for replacement, composition, shortest path, and expansion. HiPDT, a hierarchical phrase-based decoder using the PDA representation and these algorithms will be described and the complexity of the HiPDT decoder operations will be compared to decoders based on finite state automata and the widely used hypergraph representations. PDAs have strengths in a particular translation scenario: exact decoding with large SCFGs and relatively smaller language models. This talk is based on recent work with Adri de Gispert and Gonzalo Iglesias at University of Cambridge, and Michael Riley and Cyril Allauzen at Google Research.
  • W. Byrne. Syntax-based statistical machine translation, and evaluation of machine translation systems. Cognition Institute Summer School: Bilingual Minds, Bilingual Machines, June 2013. Three lecture short course. http://www.plymouth.ac.uk/pages/dynamic.asp?page=events&eventID=7506&showEvent=1
  • Juan Pino, Aurelien Waite, Tong Xiao, Adrià de Gispert, Federico Flego, and William Byrne. The University of Cambridge Russian-English system at WMT13. In Proceedings of the Eighth Workshop on Statistical Machine Translation, pages 200–205, Sofia, Bulgaria, August 2013. Association for Computational Linguistics. http://www.aclweb.org/anthology/W13-2225.
    This paper describes the University of Cambridge submission to the Eighth Workshop on Statistical Machine Translation. We report results for the Russian-English translation task. We use multiple segmentations for the Russian input language. We employ the Hadoop framework to extract rules. The decoder is HiFST, a hierarchical phrase-based decoder implemented using weighted finite-state transducers. Lattices are rescored with a higher order language model and minimum Bayes-risk objective.

2012

  • A. de Gispert, G. Blackwood, G. Iglesias, and W. Byrne. N-gram posterior probability confidence measures for statistical machine translation: an empirical study. Machine Translation, pages 1–30 (31 pages), 2012. Published online 1 September 2012. http://dx.doi.org/10.1007/s10590-012-9132-2.
    We report an empirical study of n -gram posterior probability confidence measures for statistical machine translation (SMT). We first describe an efficient and practical algorithm for rapidly computing n -gram posterior probabilities from large translation word lattices. These probabilities are shown to be a good predictor of whether or not the n -gram is found in human reference translations, motivating their use as a confidence measure for SMT. Comprehensive n -gram precision and word coverage measurements are presented for a variety of different language pairs, domains and conditions. We analyze the effect on reference precision of using single or multiple references, and compare the precision of posteriors computed from k -best lists to those computed over the full evidence space of the lattice. We also demonstrate improved confidence by combining multiple lattices in a multi-source translation framework.
  • J. Pino, A. Waite, and W. Byrne. Simple and efficient model filtering in statistical machine translation. The Prague Bulletin of Mathematical Linguistics, (98):5–24 (20 pages), 2012. Published online 6 September 2012. http://ufal.mff.cuni.cz/pbml-91-100.html.
    Data availability and distributed computing techniques have allowed statistical machine translation (SMT) researchers to build larger models. However, decoders need to be able to retrieve information efficiently from these models to be able to translate an input sentence or a set of input sentences. We introduce an easy to implement and general purpose solution to tackle this problem: we store SMT models as a set of key-value pairs in an HFile. We apply this strategy to two specific tasks: test set hierarchical phrase-based rule filtering and n-gram count filtering for language model lattice rescoring. We compare our approach to alternative strategies and show that its trade offs in terms of speed, memory and simplicity are competitive.
  • K. Hashimoto, J. Yamagishi, W. Byrne, S. King, and K. Tokuda. Impacts of machine translation and speech synthesis on speech-to-speech translation. Speech Communication, 54(7):857–866 (10 pages), September 2012. http://www.sciencedirect.com/science/article/pii/S0167639312000283.
    This paper analyzes the impacts of machine translation and speech synthesis on speech-to-speech translation systems. A typical speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques have been proposed for integration of speech recognition and machine translation. However, corresponding techniques have not yet been considered for speech synthesis. The focus of the current work is machine translation and speech synthesis, and we present a subjective evaluation designed to analyze their impact on speech-to-speech translation. The results of these analyses show that the naturalness and intelligibility of the synthesized speech are strongly affected by the fluency of the translated sentences. In addition, various features were found to correlate well with the average fluency of the translated sentences and the average naturalness of the synthesized speech.
  • The CUED OpenMT12 Arabic-English and Chinese-English SMT systems. NIST Open MT Workshop, Washington, DC, July 2012. Presentation [PDF]. http://www.nist.gov/itl/iad/mig/openmt12results.cfm
  • A. Waite, G. Blackwood, and W. Byrne. Lattice-based minimum error rate training using weighted finite-state transducers with tropical polynomial weights. In Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing (FSMNLP 2012), Donostia-San Sebastian, Spain, July 2012. (11 pages). Paper [PDF], Presentation [PDF]. http://ixa2.si.ehu.es/fsmnlp2012/.
    Minimum Error Rate Training (MERT) is a method for training the parameters of a log-linear model. One advantage of this method of training is that it can use the large number of hypotheses encoded in a translation lattice as training data. We demonstrate that the MERT line optimisation can be modelled as computing the shortest distance in a weighted finite-state transducer using a tropical polynomial semiring.
  • W. Byrne. Statistical machine translation. Cambridge Language Sciences Launch Event, Newnham College, Cambridge, May 2012. http://www.languagesciences.cam.ac.uk/event-reports/cambridge-language-sciences-launch-event
  • W. Byrne. Hierarchical phrase-based translation representations. Workshop on ‘More Structure for Better Statistical Machine Translation?’, University of Amsterdam, Netherlands, January 2012. Invited lecture. http://staff.science.uva.nl/~simaan/workshop2012.html
  • W. Byrne. Weighted finite state transducers in statistical machine translation. International Winter School in Language and Speech Technologies (WSLST 2012), Tarragona, Spain, January 2012. Six lecture short course. http://grammars.grlmc.com/wslst2012/courseDescription.php#Byrne.
    This short course will present some recent advances in statistical machine translation (SMT) using modelling approaches based on Weighted Finite State Transducers (WFSTs) and Finite State Automata (FSA). The course focus will be on decoding procedures for SMT, i.e. the generation of translations using stochastic translation grammars and language models. WFSTs can offer a very powerful modelling framework for language processing. For problems which can be formulated in terms of WFSTs or FSAs, there are general purpose algorithms which can be used to implement efficient and exact search and estimation procedures. This is true even for problems which are not inherently finite state, such as translation with some stochastic context free grammars. The course will begin with an introduction to WFSTs, pushdown automata, and semirings in the context of SMT. The use of WFST and FSA modelling approaches will be presented for: SMT decoding with phrase-based models; SMT decoding with stochastic synchronous context free grammars (e.g. Hiero); SMT parameter optimisation (MERT); the use of large language models and ’fast’ grammars in translation; translation lattice generation; and rescoring procedures such as minimum Bayes risk decoding and system combination. Implementations using the OpenFst toolkit will also be described. The course material will be suitable for researchers already familiar with SMT and who wish to learn about alternative methods in decoder design. Enough background will be given so that researchers new to machine translation or unfamiliar with applications of WFSTs in natural language processing will also find the material appropriate.

2011

  • A. de Gispert, W. Byrne, J. Xu, R. Zbib, J. Makhoul, A. Chalabi, H. Nader, N. Habash, and F. Sadat. Proprocessing Arabic for Arabic-English statistical machine translation. In J. Olive, C. Christianson, and J. McCary, editors, Handbook of natural language processing and machine translation. DARPA Global Autonomous Language Exploitation, pages 135 – 145 (11 pages). Springer, 2011
  • J. Dines, H. Liang, L. Saheer, M. Gibson, W. Byrne, K. Oura, K. Tokuda, J. Yamagishi, S. King, M. Wester, T. Hirsimäki, R. Karhila, and M. Kurimo. Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis. Computer Speech and Language, page (18 pages), 2011. In press. Available online 17 September 2011. doi:10.1016/j.csl.2011.08.003. http://www.sciencedirect.com/science/article/pii/S0885230811000441.
    In this paper we present results of unsupervised cross-lingual speaker adaptation applied to text-to-speech synthesis. The application of our research is the personalisation of speech-to-speech translation in which we employ a HMM statistical framework for both speech recognition and synthesis. This framework provides a logical mechanism to adapt synthesised speech output to the voice of the user by way of speech recognition. In this work we present results of several different unsupervised and cross-lingual adaptation approaches as well as an end-to-end speaker adaptive speech-to-speech translation system. Our experiments show that we can successfully apply speaker adaptation in both unsupervised and cross-lingual scenarios and our proposed algorithms seem to generalise well for several language pairs. We also discuss important future directions including the need for better evaluation metrics.

  • K. Hashimoto, J. Yamagishi, W. Byrne, S. King, and K. Tokuda. An analysis of machine translation and speech synthesis in speech-to-speech translation system. In Proceedings of IEEE Conference on Acoustics, Speech and Signal Processing, pages 5108 – 5111 (4 pages), 2011. http://dx.doi.org/10.1109/ICASSP.2011.5946361.
    This paper provides an analysis of the impacts of machine translation and speech synthesis on speech-to-speech translation systems. The speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Recently, many techniques for integration of speech recognition and machine translation have been proposed. However, speech synthesis has not yet been considered. The quality of synthesized speech is important, since users will not understand what the system said if the quality of synthesized speech is bad. Therefore, in this paper, we focus on the machine translation and speech synthesis components, and report a subjective evaluation to analyze the impact of each component. The results of these analyses show that the machine translation component affects the performance of speech-to-speech translation greatly, and that fluent sentences lead to higher naturalness and lower word error rate of synthesized speech.

  • G. Iglesias, C. Allauzen, W. Byrne, A. de Gispert, and M. Riley. Hierarchical phrase-based translation representations. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1373–1383 (11 pages), Edinburgh, Scotland, UK., July 2011. Association for Computational Linguistics. http://www.aclweb.org/anthology/D11-1127.
    This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. Intersection, shortest path, FSA expansion and RTN replacement algorithms are presented for PDAs. Chinese-toEnglish translation experiments using HiFST and HiPDT, FSA and PDA-based decoders, are presented using admissible (or exact) search, possible for HiFST with compact SCFG rulesets and HiPDT with compact LMs. For large rulesets with large LMs, we introduce a two-pass search strategy which we then analyze in terms of search errors and translation performance.

2010

  • Adrià de Gispert, Juan Pino and William Byrne. Hierarchical Phrase-based Translation Grammars Extracted from Alignment Posterior Probabilities. In Proc. of the Conf. on Empirical Methods in Natural Language Processing (EMNLP), October 2010.
    We report on investigations into hierarchical phrase-based translation grammars based on rules extracted from posterior distributions over alignments of the parallel text. Rather than restrict rule extraction to a single alignment, such as Viterbi, we instead extract rules based on posterior distributions provided by the HMM word-to-word alignment model. We define translation grammars progressively by adding classes of rules to a basic phrase-based system. We assess these grammars in terms of their expressive power, measured by their ability to align the parallel text from which their rules are extracted, and the quality of the translations they yield. In Chinese-to-English translation, we find that rule extraction from posteriors gives translation improvements. We also find that grammars with rules with only one nonterminal, when extracted from posteriors, can outperform more complex grammars extracted from Viterbi alignments. Finally, we show that the best way to exploit source-to-target and target-to-source alignment models is to build two separate systems and combine their output translation lattices.
  • William Byrne. Hierarchical phrase-based translation with weighted finite state transducers. Presented at Columbia University, New York, NY, USA, April 2010.
  • William Byrne. Hierarchical phrase-based translation with weighted finite state transducers. Presented at Google, Inc, Mountain View, CA, USA, April 2010.
  • Juan Pino, Gonzalo Iglesias, Adrià Gispert, Graeme Blackwood, Jamie Brunning, and William Byrne. The CUED HiFST system for the WMT10 translation shared task. In Proceedings of the ACL 2010 Joint Fifth Workshop on Statistical Machine Translation, 2010.
    This paper describes the Cambridge University Engineering Department submission to the Fifth Workshop on Statistical Machine Translation. We report results for the French-English and Spanish-English shared translation tasks in both directions. The CUED system is based on HiFST, a hierarchical phrase-based decoder implemented using weighted finite-state transducers. In the French-English task, we investigate the use of context-dependent alignment models. We also show that lattice minimum Bayes-risk decoding is an effective framework for multi-source translation, leading to large gains in BLEU score.
  • Graeme Blackwood, Adrià de Gispert, and William Byrne. Fluency constraints for minimum Bayes-risk decoding of statistical machine translation lattices. In Proceedings of the International Conference on Computational Linguistics (COLING), 2010.
    A novel and robust approach to incorporating natural language generation into statistical machine tr anslation is developed within a minimum Bayes-risk decoding framework. Segmentation of translation l attices is guided by confidence measures over the maximum likelihood translation hypothesis in order to focus on regions with potential translation errors. Modeling techniques intended to improve flue ncy in low confidence regions are introduced so as to improve overall translation fluency.
  • Graeme Blackwood, Adrià de Gispert, and William Byrne. Efficient path counting transducers for minimum Bayes-risk decoding of statistical machine translation lattices. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2010.
    This paper presents an efficient implementation of linearised lattice minimum Bayes-risk decoding using weighted finite state transducers. We introduce transducers to efficiently count lattice paths containing n-grams and use these to gather the required statistics. We show that these procedures can be implemented exactly through simple transformations of word sequences to sequences of n-grams. This yields a novel implementation of lattice minimum Bayes-risk decoding which is fast and exact even for very large lattices.
  • Adrià de Gispert, Gonzalo Iglesias, Graeme Blackwood, Eduardo R. Banga, and William Byrne. Hierarchical phrase-based translation with weighted finite state transducers and shallow-N grammars. Computational Linguistics, 36(3), September 2010.
    In this paper we describe 'HiFST', a lattice-based decoder for hierarchical phrase-based translation and alignment. The decoder is implemented with standard Weighted Finite-State Transducer (WFST) operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs rather than k-best lists requires less pruning in translation search, resulting in fewer search errors, better parameter optimization, and improved translation performance. The direct generation of translation lattices in the target language can improve subsequent rescoring procedures, yielding further gains when applying long-span language models and Minimum Bayes Risk decoding. We also give insight as to how to control the size of the search space defined by hierarchical rules. We show that shallow-N grammars, low-level rule catenation and other search constraints can help to match the power of the translation system to specific language pairs.

2009

  • William Byrne. Hierarchical phrase-based translation with weighted finite state transducers. The Johns Hopkins University Center for Language and Speech Processing, Baltimore, MD, USA, November 2009.
  • A de Gispert, G Iglesias, G Blackwood, J Brunning, and B Byrne. The CUED NIST 2009 Arabic-English SMT System. NIST Open Machine Translation 2009 Evaluation (MT09) Workshop, August 2009. Presentation (slides, poster, etc.)
  • W. Byrne. Context-dependent alignment models and hierarchical phrase-based translation with weighted finite state transducers. GALE PI Meeting, May 2009. Presentation (slides, poster, etc.)
  • G. Iglesias, A. de Gispert, E. R. Banga, and W. Byrne. Rule filtering by pattern for efficient hierarchical translation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2009), 2009. Presentation (slides, poster, etc.)
    We describe refinements to hierarchical translation search procedures intended to reduce both search errors and memory usage through modifications to hypothesis expansion in cube pruning and reductions in the size of the rule sets used in translation. Rules are put into syntactic classes based on the number of non-terminals and the pattern, and various filtering strategies are then applied to assess the impact on translation speed and quality. Results are reported on the 2008 NIST Arabic-to-English evaluation task.
  • G. Iglesias, A. de Gispert, E. R. Banga, and W. Byrne. Hierarchical phrase-based translation with weighted finite state transducers. In Procedings of NAACL-HLT, 2009. Presentation (slides, poster, etc.)
    This paper describes a lattice-based decoder for hierarchical phrase-based translation. The decoder is implemented with standard WFST operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs rather than k-best lists requires less pruning in translation search, resulting in fewer search errors, direct generation of translation lattices in the target language, better parameter optimization, and improved translation performance when rescoring with long-span language models and MBR decoding. We report translation experiments for the Arabic-to-English and Chinese-to-English NIST translation tasks and contrast the WFST-based hierarchical decoder with hierarchical translation under cube pruning.
  • Gonzalo Iglesias, Adrià de Gispert, Eduardo R. Banga, and William Byrne. The hifst system for the europarl spanish-to-english task. In Proceedings of SEPLN, pages 207–214, 2009. Presentation (slides, poster, etc.)
    In this paper we present results for the Europarl Spanish-to-English translation task. We use HiFST, a novel hierarchical phrase-based translation system implemented with finite-state technology that creates target lattices rather than k-best lists
  • A. de Gispert, S. Virpioja, M. Kurimo, and W. Byrne. Minimum Bayes risk combination of translation hypotheses from alternative morphological decompositions. In Procedings of NAACL-HLT, 2009. Presentation (slides, poster, etc.)
    We describe a simple strategy to achieve translation performance improvements by combining output from identical statistical machine translation systems trained on alternative morphological decompositions of the source language. Combination is done by means of Minimum Bayes Risk decoding over a shared Nbest list. When translating into English from two highly inflected languages such as Arbic and Finnish we obtain significant improvements over simply selecting the best morphological decomposition.
  • J. Brunning, A. de Gispert, and W. Byrne. Context-dependent alignment models for statistical machine translation. In Procedings of NAACL-HLT, 2009. Presentation (slides, poster, etc.)
    We introduce alignment models for Machine Translation that take into account the context of a source word when determining its translation. Since the use of these contexts alone causes data sparsity problems, we develop a decision tree algorithm for clustering the contexts based on optimisation of the EM auxiliary function. We show that our context-dependent models lead to an improvement in alignment quality, and an increase in translation quality when the alignments are used to build a machine translation system.

2008

  • W. Byrne. Phrase-based statistical machine translation with weighted finite state transducers. Presented at IRTG Summer School in Computational Linguistics and Psycholinguistics, September 2008. Invited tutorial.
    The Transducer Translation Model (TTM) for phrase-based statistical machine translation system follows a generative model of translation and is implemented by the composition of component models realized as Weighted Finite State Transducers via the OpenFst Toolkit. This flexible architecture requires no special purpose decoder and readily handles the large-scale natural language processing demands of state-of-the-art machine translation systems. This presentation describes how the system was used for the NIST 2008 Arabic-English machine translation evaluation task and for the Spanish-English and French-English translation in the ACL 2008 Third Workshop on Statistical Machine Translation Shared Task. General issues in using WFSTs for such tasks will also be discussed.
  • W. Byrne. Statistical machine translation. Advanced Machine Learning Tutorial Lectures Series, Cambridge University Engineering Department, February 2008.
  • A. de Gispert, G. Blackwood, J. Brunning, and W. Byrne. The CUED NIST 2008 Arabic-English SMT System. Presented at NIST MT Workshop, March 2008.
  • W. Byrne. Statistical techniques in machine translation. Google EMEA Faculty Summit, Zurich, Switzerland, 2008. Keynote lecture. Presentation (slides, poster, etc.)
  • G. Blackwood, A. de Gispert, J. Brunning, and W. Byrne. European language translation with weighted finite state transducers: The CUED MT system for the 2008 ACL workshop on statistical machine translation. In Proceedings of the ACL 2008 Third Workshop on Statistical Machine Translation, June 2008.
    We describe the Cambridge University Engineering Department phrase-based statistical machine translation system for Spanish-English and French-English translation in the ACL 2008 Third Workshop on Statistical Machine Translation Shared Task. The CUED system follows a generative model of translation and is implemented by composition of component models realised as Weighted Finite State Transducers, without the use of a special-purpose decoder. Details of system tuning for both Europarl and News translation tasks are provided.
  • G. Blackwood, A. de Gispert, and W. Byrne. Phrasal segmentation models for statistical machine translation. In Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, UK, August 2008.
    Phrasal segmentation models define a mapping from the words of a sentence to sequences of translatable phrases. We discuss the estimation of these models from large quantities of monolingual training text and describe their realization as weighted finite state transducers for incorporation into phrase-based statistical machine translation systems. Results are reported on the NIST Arabic-English translation tasks showing significant complementary gains in BLEU score with large 5-gram and 6-gram language models.
  • G. Blackwood, A. de Gispert, J. Brunning, and W. Byrne. Large-scale statistical machine translation with weighted finite state transducers. In Proceedings of FSMNLP 2008: Finite-State Methods and Natural Language Processing, Ispra, Lago Maggiore, Italy, September 2008.
    The Cambridge University Engineering Department phrase-based statistical machine translation system follows a generative model of translation and is implemented by the composition of component models realised as Weighted Finite State Transducers. Our flexible architecture requires no special purpose decoder and readily handles the large-scale natural language processing demands of state-of-the-art machine translation systems. In this paper we describe the CUED participation in the NIST 2008 Arabic-English machine translation evaluation task.
  • Y. Deng and W. Byrne. HMM word and phrase alignment for statistical machine translation. IEEE Transactions on Audio, Speech, and Language Processing, 16(3):494–507, March 2008.
    Efficient estimation and alignment procedures for word and phrase alignment HMMs are developed for the alignment of parallel text. The development of these models is motivated by an analysis of the desirable features of IBM Model 4, one of the original and most effective models for word alignment. These models are formulated to capture the desirable aspects of Model 4 in an HMM alignment formalism. Alignment behavior is analyzed and compared to human-generated reference alignments, and the ability of these models to capture different types of alignment phenomena is evaluated. In analyzing alignment performance, Chinese-English word alignments are shown to be comparable to those of IBM Model-4 even when models are trained over large parallel texts. In translation performance, phrase-based statistical machine translation systems based on these HMM alignments can equal and exceed systems based on Model-4 alignments, and this is shown in Arabic-English and Chinese-English translation. These alignment models can also be used to generate posterior statistics over collections of parallel text, and this is used to refine and extend phrase translation tables with a resulting improvement in translation quality.

2007

  • K.-C. Sim, W. Byrne, M. Gales, H. Sahbi, and P.C. Woodland. Consensus network decoding for statistical machine translation system combination. In IEEE Conference on Acoustics, Speech and Signal Processing, 2007.
    This paper presents a simple and robust consensus decoding approach for combining multiple Machine Translation (MT) system outputs. A consensus network is constructed from an N -best list by aligning the hypotheses against an alignment reference, where the alignment is based on minimising the translation edit rate (TER). The Minimum Bayes Risk (MBR) decoding technique is investigated for the selection of an appropriate alignment reference. Several alternative decoding strategies proposed to retain coherent phrases in the original translations. Experimental results are presented primarily based on three-way combination of Chinese-English translation outputs, and also presents results for six-way system combination. It is shown that worthwhile improvements in translation performance can be obtained using the methods discussed.
  • X. A. Liu, W. J. Byrne, M. J. F. Gales, A. de Gispert, M. Tomalin, P. C. Woodland, and K. Yu. Discriminative language model adaptation for mandarin broadcast speech transcription and translation. In Proc. IEEE Automatic Speech Recognition and Understanding (ASRU), Kyoto, Japan, 2007.

2006

  • Y. Deng and W. Byrne. MTTK: An alignment toolkit for statistical machine translation. Presented in the HLT-NAACL Demonstrations Program, June 2006. Presentation (slides, poster, etc.)
    The MTTK alignment toolkit for statistical machine translation can be used for word, phrase, and sentence alignment of parallel documents. It is designed mainly for building statistical machine translation systems, but can be exploited in other multilingual applications. It provides computationally efficient alignment and estimation procedures that can be used for the unsupervised alignment of parallel text collections in a language independent fashion. MTTK Version 1.0 is available under the Open Source Educational Community License.
  • W. Byrne. Integrating automatic speech recognition and statistical machine translation. TC-STAR OpenLab on Speech Translation, Trento, Italy, April 2006. Invited tutorial. Presentation (slides, poster, etc.)
  • W. Byrne. Statistical phrase-based speech translation. GALE Mid-Phase PI Meeting, March 2006. Presentation (slides, poster, etc.)
  • L. Mathias and W. Byrne. Statistical phrase-based speech translation. In IEEE Conference on Acoustics, Speech and Signal Processing, 2006. Presentation (slides, poster, etc.)
    A generative statistical model of speech-to-text translation is developed as an extension of existing models of phrase-based text translation. Speech is translated by mapping ASR word lattices to lattices of phrase sequences which are then translated using operations developed for text translation. Performance is reported on Chinese to English translation of Mandarin Broadcast News.
  • S. Kumar, Y. Deng, and W. Byrne. A weighted finite state transducer translation template model for statistical machine translation. Journal of Natural Language Engineering, 12(1):35–75, March 2006.
    We present a Weighted Finite State Transducer Translation Template Model for statistical machine translation. The approach we describe allows us to implement each constituent distribution of the model as a weighted finite state transducer or acceptor. We show that bitext word alignment and translation under the model can be performed with standard FSM operations involving these transducers. One of the benefits of using this framework is that it avoids the need to develop specialized search procedures, even for the generation of lattices or N-Best lists of bitext word alignments and translation hypotheses. We report and analyze bitext word alignment and translation performance of the model on French-English and Chinese-English tasks.

    See also CLSP Tech. Rep. 48, 2004 – Download
  • Y. Deng, S. Kumar, and W. Byrne. Segmentation and alignment of parallel text for statistical machine translation. Journal of Natural Language Engineering, 13(3):235–260, 2006.
    We address the problem of extracting bilingual chunk pairs from parallel text to create training sets for statistical machine translation. We formulate the problem in terms of a stochastic generative process over text translation pairs, and derive two different alignment procedures based on the underlying alignment model. The first procedure is a now-standard dynamic programming alignment model which we use to generate an initial coarse alignment of the parallel text. The second procedure is a divisive clustering parallel text alignment procedure which we use to refine the first-pass alignments. This latter procedure is novel in that it permits the segmentation of the parallel text into sub-sentence units which are allowed to be reordered to improve the chunk alignment. The quality of chunk pairs are measured by the performance of machine translation systems trained from them. We show practical benefits of divisive clustering as well as how system performance can be improved by exploiting portions of the parallel text that otherwise would have to be discarded. We also show that chunk alignment as a first step in word alignment can significantly reduce word alignment error rate.

2005

  • S. Kumar, Y. Deng, and W. Byrne. Johns Hopkins University - Cambridge University Chinese-English and Arabic-English 2005 NIST MT Evaluation Systems. Presented at the 2005 NIST MT Workshop, June 2005. Presentation (slides, poster, etc.)
  • W. Byrne. Current Research in Phrase-Based Statistical Machine Translation – and some links to ASR. Presented at Kings College London, May 2005. Presentation (slides, poster, etc.)
  • W. Byrne. Phrase-based statistical machine translation using finite state machines – with some links to ASR. Presented at the University of Washington, Seattle, WA, May 2005. Presentation (slides, poster, etc.)
  • S. Kumar, Y. Deng, and W. Byrne. JHU/CUED Chinese-English translation system – 2005 TC-STAR evaluation. Presented at the TC-STAR Evaluation Meeting, Trento, Italy, April 2005. Presentation (slides, poster, etc.)
  • W. Byrne. Current research in phrase-based statistical machine translation and some links to ASR. Seminar Series, Institute for Collaborative and Communicating Systems and Human Communication Research Centre, University of Edinburgh, January 2005. Presentation (slides, poster, etc.)
  • W. Byrne. Current research in phrase-based statistical machine translation and some links to ASR. Machine Intelligence Laboratory Speech Seminar, Cambridge University Engineering Department, March 2005. Presentation (slides, poster, etc.)
  • S. Kumar and W. Byrne. Local phrase reordering models for statistical machine translation. In Proceedings of HLT-EMNLP, 2005. Presentation (slides, poster, etc.)
    We describe stochastic models of local phrase movement that can be incorporated into a Statistical Machine Translation (SMT) system. These models provide properly formulated, non-deficient, probability distributions over reordered phrase sequences. They are implemented by Weighted Finite State Transducers. We describe EM-style parameter re-estimation procedures based on phrase alignment under the complete translation model incorporating reordering. Our experiments show that the reordering model yields substantial improvements in translation performance on Arabic-to-English and Chinese-to-English MT tasks. We also show that the procedure scales as the bitext size is increased.
  • Y. Deng and W. Byrne. HMM word and phrase alignment for statistical machine translation. In Proceedings of HLT-EMNLP, 2005. Presentation (slides, poster, etc.)
    HMM-based models are developed for the alignment of words and phrases in bitext. The models are formulated so that alignment and parameter estimation can be performed efficiently. We find that Chinese-English word alignment performance is comparable to that of IBM Model-4 even over large training bitexts. Phrase pairs extracted from word alignments generated under the model can also be used for phrase-based translation, and in Chinese to English and Arabic to English translation, performance is comparable to systems based on Model-4 alignments. Direct phrase pair induction under the model is described and shown to improve translation performance.

2004

  • W. Byrne. Minimum Risk Estimation and Decoding for Speech and Language Processing. Presented at Microsoft Research, Redmond Washington, February 2004.
  • W. Byrne. Minimum Risk Estimation and Decoding for Speech and Language Processing. Presented at the Signal, Speech and Language Interpretation Lab, University of Washington, February 2004.
  • W. Byrne. Minimum Risk Estimation and Decoding for Speech and Language Processing. Presented at the Speech Analysis and Interpretation Laboraory, University of Southern California School of Engineering, February 2004.
  • S. Kumar et al. The Johns Hopkins University 2004 Chinese-English and Arabic-English MT Evaluation Systems. Presented at the 2004 NIST MT Workshop, June 2004. Presentation (slides, poster, etc.)
  • W. Byrne, S. Khudanpur, W. Kim, S. Kumar, P. Pecina, P. Virga, P. Xu, and D. Yarowsky. The Johns Hopkins University 2003 Chinese-English machine translation system. Presented at the 2003 NIST MT Workshop, June 2004. Presentation (slides, poster, etc.)
  • W. Byrne. Current research in statistical machine translation and links with automatic speech recognition. ISM Open Lectures on Statistical Speech Processing, The Institute for Statistical Mathematics, Tokyo, Japan, December 2004. Invited lecture. Presentation (slides, poster, etc.)
  • S. Kumar and W. Byrne. Minimum Bayes-risk decoding for statistical machine translation. In Proceedings of HLT-NAACL, 2004. Presentation (slides, poster, etc.)
    We present Minimum Bayes-Risk (MBR) decoding for statistical machine translation. This statistical approach aims to minimize expected loss of translation errors under loss functions that measure translation performance. We describe a hierarchy of loss functions that incorporate different levels of linguistic information from word strings, word-to-word alignments from an MT system, and syntactic structure from parse-trees of source and target language sentences. We report the performance of the MBR decoders on a Chinese-to-English translation task. Our results show that MBR decoding can be used to tune statistical MT performance for specific loss functions.

2003

  • D. Oard, D. Doermann, B. Dorr, D. He, P. Resnik, W. Byrne, S. Khudanpur, D. Yarowsky, A. Leuski, P. Koehn, and K. Knight. Desperately seeking Cebuano. In Proceedings of HLT-NAACL, 2003.
    This paper describes an effort to rapidly develop language resources and component technology to support searching Cebuano news stories using English queries. Results from the first 60 hours of the exercise are presented.
  • S. Kumar and W. Byrne. A weighted finite state transducer implementation of the alignment template model for statistical machine translation. In Proceedings of HLT-NAACL, 2003. Presentation (slides, poster, etc.)
    We present a derivation of the alignment template model for statistical machine translation and an implementation of the model using weighted finite state transducers. The approach we describe allows us to implement each constituent distribution of the model as a weighted finite state transducer or acceptor. We show that bitext word alignment and translation under the model can be performed with standard FSM operations involving these transducers. One of the benefits of using this framework is that it obviates the need to develop specialized search procedures, even for the generation of lattices or N-Best lists of bitext word alignments and translation hypotheses. We evaluate the implementation of the model on the Frenchto- English Hansards task and report alignment and translation performance.
  • W. Byrne, S. Khudanpur, W. Kim, S. Kumar, P. Pecina, P.Virga, P. Xu, and D. Yarowsky. The Johns Hopkins University 2003 Chinese-English Machine Translation System. In Machine Translation Summit IX. The Association for Machine Translation in the Americas, 2003. Presentation (slides, poster, etc.)
    We describe a Chinese to English Machine Translation system developed at the Johns Hopkins University for the NIST 2003 MT evaluations. The system is based on a Weighted Finite State Transducer implementation of the alignment template translation model for statistical machine translation. The baseline MT system was trained using 100,000 sentence pairs selected from a static bitext training collection. Information retrieval techniques were then used to create specific training collections for each document to be translated. This document-specific training set included bitext and name entities that were then added to the baseline system by augmenting the library of alignment templates. We report translation performance of baseline and IR-based systems on two NIST MT evaluation test sets.

2002

  • S. Kumar and W. Byrne. Minimum Bayes-risk alignment of bilingual texts. In Proc. of the Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, USA, 2002. Presentation (slides, poster, etc.)
    We present Minimum Bayes-Risk word alignment for machine translation. This statistical, model-based approach attempts to minimize the expected risk of alignment errors under loss functions that measure alignment quality. We describe various loss functions, including some that incorporate linguistic analysis as can be obtained from parse trees, and show that these approaches can improve alignments of the English-French Hansards.