Recent News

EACL'17 paper on NMT MBR decoding

Neural Machine Translation by Minimising the Bayes-risk with Respect to Syntactic Translation Lattices
Felix Stahlberg, Adrià de Gispert, Eva Hasler, and Bill Byrne

Summarization and SMT , to appear in Computer Speech & Language

Source sentence simplification for statistical machine translation
E. Hasler, A. de Gispert, F. Stahlberg, A. Waite, and W. Byrne

ACL 2016 paper on syntactic MT and NMT

Syntactically Guided Neural Machine Translation
Felix Stahlberg, Eva Hasler, Aurelien Waite, and William Byrne

WMT'16 English-German System Description

The Edit Distance Transducer in Action: The University of Cambridge English-German System at WMT16
Felix Stahlberg, Eva Hasler, and William Byrne

Rory Waite has successfully completed his PhD

Thesis: The Geometry of Statistical Machine Translation
thesis

Want to rescore translation lattices with bilingual neural networks ? Try:

Transducer Disambiguation with Sparse Topological Features
Gonzalo Iglesias, Adrià de Gispert, and William Byrne
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015
http://www.aclweb.org/anthology/D/D15/D15-1273.pdf

HLT/NAACL 2015 paper on The Geometry of Statistical Machine Translation

Aurelien Waite and William Byrne
http://mi.eng.cam.ac.uk/~wjb31/PUBS/naaclhlt2015.aw5.pdf

Hierarchical Statistical Semantic Realization for Minimal Recursion Semantics

Matic Horvat, Ann Copestake, William Byrne
Proceedings of the International Conference on Computational Semantics (IWCS 2015)
ACL web link

Juan Pino has successfully completed his PhD and is now at Facebook

Thesis: Refinements in Hierarchical Phrase-Based Translation Systems
thesis

Eva Hasler will join the Cambridge SMT group

Eva will work on the EPSRC-funded project: `Improving Target Language Fluency in Statistical Machine Translation'

New EPSRC project on SMT -- POSTDOCTORAL RESEARCH POSITIONS AVAILABLE

Project: `Improving Target Language Fluency in Statistical Machine Translation'
EPSRC page

Open Source versions of the Cambridge SMT decoders have been released !

http://ucam-smt.github.io

COLING 2014: Hiero with GHKM tree-to-string translation rules

A paper on introducing GHKM tree-to-string translation rules into Hiero will appear at COLING2014
http://mi.eng.cam.ac.uk/~wjb31/ppubs/XiaoTongCOLING2014.pdf

Summer 2014

Matic Horvat will spend the summer at the USC Information Sciences Institute, as an Intern in the Natural Language Group
http://nlg.isi.edu/jobs.html

February 2014 - EACL'14 Student Research Workshop

A paper based on the MPhil thesis of Matic Horvat will appear at the 2014 EACL Student Research Workshop.
A Graph-Based Approach to String Regeneration http://www.cl.cam.ac.uk/~mh693/files/eacl2014_paper_prepublish.pdf

December 2013 - Paper on Word Ordering Accepted to EACL 2014

Word Ordering with Phrase-Based Grammars
Adrià de Gispert, Marcus Tomalin, Bill Byrne
- We describe an approach to word ordering using modelling techniques from statistical machine translation. The system incorporates a phrase-based model of string generation that aims to take unordered bags of words and produce fluent, grammatical sentences. We describe the generation grammars and introduce parsing procedures that address the computational complexity of generation under permutation of phrases. Against the best previous results reported on this task, obtained using syntax driven models, we report huge quality improvements, with BLEU score gains of 20+ which we confirm with human fluency judgements. Our system incorporates dependency language models, large n-gram language models, and minimum Bayes risk decoding.
- http://mi.eng.cam.ac.uk/~wjb31/ppubs/EACL2014.pdf

December 2013 - Collaboration with Michael Riley and Cyril Allauzen of Google Research to appear in Computational Linguistics

Pushdown Automata in Statistical Machine Translation.
C. Allauzen, B. Byrne, A. de Gispert, G. Iglesias, M. Riley.
Computational Linguistics. To appear.
- This paper describes the use of pushdown automata (PDA) in the context of statistical machine translation and alignment under a synchronous context-free grammar. We use PDAs to compactly represent the space of candidate translations generated by the grammar when applied to an input sentence. General-purpose PDA algorithms for replacement, composition, shortest path, and expansion are presented. We describe HiPDT, a hierarchical phrase-based decoder using the PDA representation and these algorithms. We contrast the complexity of this decoder with a decoder based on a finite state automata (FSA) representation, showing that PDAs provide a more suitable framework to achieve exact decoding for larger SCFGs and smaller language models. We assess this experimentally on a large-scale Chinese-to-English alignment and translation task. In translation, we propose a two-pass decoding strategy involving a weaker language model in the first-pass to address the results of PDA complexity analysis. We study in depth the experimental conditions and tradeoffs in which HiPDT can achieve state-of- the-art performance for large-scale SMT.
- http://mi.eng.cam.ac.uk/~wjb31/ppubs/cl2013.final.pdf

October 2013 - Matic Horvat starts as a PhD student on SMT

Matic will work on Semantics in SMT, jointly supervised by Ann Copestake and Bill Byrne

2013-2014 Academic Visitors

Dr Tong Xiao from the Northeastern University, China is visiting Cambridge to work on syntax in Chinese-English machine translation.
Dr Anssi Yli.Jyrä from the University of Helsinki is visiting Cambridge as Clare Hall Research Fellow to work on syntax and weighted finite state automata in translation

August 2013 - WMT Presentation

The University of Cambridge Russian-English System at WMT13
Pino, J., Waite, A., Xiao, T., de Gispert, A., Flego, F., Byrne, W.
http://www.aclweb.org/anthology/W/W13/W13-2225.pdf

July 2013 - International Conference on Finite-State Methods and Natural Language Processing -- FSMNLP 2013

W. Byrne. Pushdown Automata in Statistical Machine Translation. Keynote lecture.

June 2013 - Cognition Institute Summer School: Bilingual minds, bilingual machines

Tutorial on `Syntax-based statistical machine translation'.
Cognition Institute Summer School: Bilingual minds, bilingual machines.

2013 Summer Students

Ed Hughes from the Department of Pure Maths and Mathematical Statistics will work with Rory Waite on SMT system optimisation using techniques from tropical geometry. Ed will be sponsored by the DPMMS Post Master’s Consultancy Scheme.

May 2013 - New Russian-English SMT system

Excellent results in Russian-English translation in the WMT 2013 evaluation. Link to official results

April 2013 FP7 FAUST Project Concludes Successfully -- faust-fp7.eu/faust/

Project receives a rating of 'Excellent progress (the project has fully achieved its objectives and technical goals and has even exceeded expectations)' in its final review in Luxembourg.

2013-2014 Academic Visitors

Dr Tong Xiao from the Northeastern University, China is visiting Cambridge to work on syntax in Chinese-English machine translation.

Excellent results in Chinese and Arabic Translation in the 2012 NIST OpenMT Evaluation

Official Results

September 2012 PBML Paper on using HFiles for fast MT model access

Simple and Efficient Model Filtering in Statistical Machine Translation
Juan Pino, Aurelien Waite, William Byrne
The Prague Bulletin of Mathematical Linguistics No. 98, 2012, pp. 5–24.
http://ufal.mff.cuni.cz/pbml/98/art-pino-waite-byrne.pdf
- Data availability and distributed computing techniques have allowed statistical machine translation (SMT) researchers to build larger models. However, decoders need to be able to retrieve information efficiently from these models to be able to translate an input sentence or a set of input sentences. We introduce an easy to implement and general purpose solution to tackle this problem: we store SMT models as a set of key-value pairs in an HFile. We apply this strategy to two specific tasks: test set hierarchical phrase-based rule filtering and n-gram count filtering for language model lattice rescoring. We compare our approach to alternative strategies and show that its trade offs in terms of speed, memory and simplicity are competitive.

August 2012 MT Journal article on posteriors as translation confidence measures

N-gram posterior probability confidence measures for statistical machine translation: an empirical study
Adrià de Gispert, Graeme Blackwood, Gonzalo Iglesias and William Byrne
Machine Translation Journal,
http://www.springerlink.com/content/748552rj128q8337
- We report an empirical study of n-gram posterior probability confidence measures for statistical machine translation (SMT). We first describe an efficient and practical algorithm for rapidly computing n-gram posterior probabilities from large translation word lattices. These probabilities are shown to be a good predictor of whether or not the n-gram is found in human reference translations, motivating their use as a confidence measure for SMT. Comprehensive n-gram precision and word coverage measurements are presented for a variety of different language pairs, domains and conditions. We analyze the effect on reference precision of using single or multiple references, and compare the precision of posteriors computed from k-best lists to those computed over the full evidence space of the lattice. We also demonstrate improved confidence by combining multiple lattices in a multi-source translation framework.

July 2012 FSMNLP paper on links between LMERT and tropical polynomials

Lattice-based minimum error rate training using weighted finite-state transducers with tropical polynomial weights.
A. Waite, G. Blackwood, W. Byrne
10th International Workshop on Finite State Methods and Natural Language Processing (FSMNLP 2012), Donostia-San Sebastian, Spain, July 2012.
- Minimum Error Rate Training (MERT) is a method for training the parameters of a log-linear model. One advantage of this method of training is that it can use the large number of hypotheses encoded in a translation lattice as training data. We demonstrate that the MERT line optimisation can be modelled as computing the shortest distance in a weighted finite-state transducer using a tropical polynomial semiring.

July 2012 Rory Waite is spending the summer in Los Angeles as an interning at SDL

June 2012 EAMT 2012 Best Paper Award !

Can Automatic Post-Editing Make MT More Meaningful?
Kristen Parton, Nizar Habash, Kathleen McKeown, Gonzalo Iglesias, Adrià de Gispert

April 2012

Federico Flego joins the Delphi SMT project as an RA

EAMT 2012 paper with Columbia University

Can Automatic Post-Editing Make MT More Meaningful?
- Kristen Parton, Nizar Habash, Kathleen McKeown, Gonzalo Iglesias, Adrià de Gispert
- Automatic post-editors (APEs) enable the re-use of black box machine translation (MT) systems for a variety of tasks where different aspects of translation are important. In this paper, we describe APEs that target adequacy errors, a critical problem for tasks such as cross-lingual question-answering, and compare different approaches for post-editing: a rule-based system and a feedback approach that uses a computer in the loop to suggest improvements to the MT system. We test the APEs on two different MT systems and across two different genres. Human evaluation shows that the APEs significantly improve adequacy, regardless of approach, MT system or genre: 30-56% of the post-edited sentences have improved adequacy compared to the original MT.

March 2012 Marcus Tomalin seminar at University of Edinburgh

Marcus Tomalin gave a seminar titled `In Search of `Natural' Speech: Grammaticality, Acceptability, and Speech Technology' at the Edinburgh Linguistics Circle, March 2012
- Although state-of-the-art large vocabulary Automatic Speech Recognition (ASR) and Statistical Machine Translation (SMT) systems often achieve impressive Word Error Rates (WERs) and BLEU scores respectively, end-users frequently consider the word sequences output by such systems to be `unnatural'. The perceived `unnaturalness' usually results from the accumulation of many small linguistic errors (e.g., lack of subject-verb agreement, partially scrambled syntax, homophonic substitution). Consequently, in recent years there has been a renewed interest in improving the `naturalness' of ASR and SMT output, even in systems that produce good WER and BLEU scores.
  In this talk, the perceived `naturalness' of ASR and SMT transcriptions will be considered in the context of on-going debates about grammaticality and acceptability. An experimental framework for exploring these aspects of ASR/SMT transcriptions is described, and a methodology for improving the `naturalness' of such outputs is presented. The simplest ways of modifying an input word sequence are insertion, permutation, deletion, and substitution, and the approach adopted in this work makes use of a Combinatory Categorial Grammar (CCG) text generation system which enables input word sequences to be modified so as to improve their `naturalness'. It is shown that the output produced by the CCG-based system is considerably improved if the N-best generated hypotheses are rescored and reranked using Ngram-based techniques.

Yue Zhang appointed to Assistant Professor

Dr. Yue Zhang will take up position as Assistant Professor at Singapore University of Technology and Design with effect from July 2012. Yue has been a Research Associate at the Cambridge Computer Laboratory working on parsing and natural language generation for MT as part of the FAUST project.

February 2012 -- Article to appear in Speech Communication

Impacts of machine translation and speech synthesis on speech-to-speech translation
Kei Hashimoto, Junichi Yamagishi, William Byrne, Simon King, Keiichi Tokuda

26--27 January 2012 -- Short Course on Weighted Finite State Transducers in Statistical Machine Translation

Bill Byrne will give a six lecture course on Weighted Finite State Transducers in Statistical Machine Translation at the 2012 International Winter School in Language and Speech Technologies, Tarragona, Spain.
- Course description

20 January 2012 -- Seminar

W. Byrne. Hierarchical phrase-based translation representations. Workshop on More Structure for Better Statistical Machine Translation?, University of Amsterdam, Netherlands, January 2012.

Postdoctoral Research Opportunities in SMT

Job Posting -- Research Associate in Statistical Machine Translation
Closing date is 30 December 2011

Marcus Tomalin joins the FAUST project

Marcus will work on shallow generation for fluency in MT

2011 Visitors

October 3--9 -- Brian Roark, Oregon Health & Science University (OHSU).

Seminar: Applications of Lexicographic Semirings in Speech and Language Processing

October 10--11 -- Markos Mylonakis, University of Amsterdam.

Seminar: Learning Hierarchical Translation Structure with Linguistic Annotations

September 2011

Graeme Blackwood (CUED PhD 2010) starts as Research Staff Member in the Machine Translation Group at the IBM T.J. Watson Research Center on the 3rd of October.
Cambridge Spanish-English and French-English interactive SMT systems are running on Reverso Labs
Real-time SMT systems based on Cambridge's HiFST decoder.
Try our systems and the other FAUST MT systems at http://labs.reverso.net

2011 Summer Internships

Juan Pino is in Mountain View, CA (USA) at Google, Inc. working on morphology in Russian MT
Matt Shannon is in London at Google, Inc. working on HMM-based speech synthesis

July 2011 -- Paper to be presented at the Conference on Empirical Methods in Natural Language Processing (EMNLP'11) -- joint work with Google Research

Hierarchical Phrase-based Translation Representations.
Gonzalo Iglesias, William Byrne, Adrià de Gispert, Department of Engineering, University of Cambridge
Cyril Allauzen, Michael Riley, Google Research

19 July 2011 -- Bill Byrne discusses interactive machine translation on the BBC World Service Radio Programme 'Click'

The FP7 FAUST project is featured in a discussion on openness on the internet

Listen: BBC Programme website, with audio

FAUST project website: http://faust-fp7.eu

An extended version of the interview broadcast on Click on BBC World Service Radio, 19th July 2011 is available on the The Open University website.

June 2011 -- Cambridge Spanish-English interactive SMT systems are running on Reverso Labs

Real-time SMT systems based on Cambridge's HiFST decoder.
Try our systems and the other FAUST MT systems at http://labs.reverso.net

April 2011 -- Gonzalo Iglesias, EAMT Best Thesis Awardee 2010

From the EAMT website http://www.eamt.org/news/news_best_thesis_winner.php :

Dr. Gonzalo Iglesias has received a prize of €500 and has been granted a €200 bursary so that he can present a summary of his thesis at the Annual Conference of the EAMT (EAMT-2011) which will take place in Leuven, on May 30-31, 2011.

2011 Talks and Presentations

A. de Gispert. Hierarchical Phrase-Based Translation at University of Cambridge. Talk at Barcelona Media Innovation Centre, Barcelona, Catalonia (Spain), July 2011.
A. de Gispert. Hierarchical Phrase-Based Translation at University of Cambridge. Talk at Catalonia Research Group on Accessibility and Ambient Intelligence (CaiaC), Universitat Autònoma de Catalunya, Bellaterra, Catalonia (Spain), July 2011.
A. de Gispert. Hierarchical Phrase-Based Representations: Decoding with Push-Down Transducers and Entropy-Pruned Language Models. Talk at DARPA GALE PI Meeting, Arlington, VA (USA), May 2011.
A. de Gispert. Hierarchical Phrase-Based Translation at University of Cambridge. Talk at Google Research Labs, Mountain View, CA (USA), May 2011.
G. Blackwood. Minimum Bayes-Risk Lattice Rescoring Methods for Statistical Machine Translation. Natural Language Processing Seminar, Computer Lab, University of Cambridge. May 2011.
G. Blackwood. Lattice Rescoring Methods for Statistical Machine Translation. Talk at SRI, Menlo Park, CA (USA), April 2011.
G. Iglesias. Hierarchical Phrase-based Translation with Weighted Finite State Transducers. Invited Presentation and Best Thesis Award for 2010. 15th Annual Conference of the European Association for Machine Translation, Leuven, Belgium, March 2011.
G. Blackwood. Lattice Rescoring Methods for Statistical Machine Translation. Talk at IBM TJ Watson Research Labs, Yorktown Heights, NY (USA). February, 2011.

September 2010 -- Paper published in Computational Linguistics

Adrià de Gispert, Gonzalo Iglesias, Graeme Blackwood, Eduardo Banga, William Byrne. Hierarchical Phrase-based Translation with Weighted Finite State Transducers and Shallow-N Grammars. Computational Linguistics 36(3):505-533. September 2010 (PDF) (bib)

2010 Talks and Presentations

William Byrne. Hierarchical phrase-based translation with weighted finite state transducers. Keynote speech at IWSLT 2010, Paris (France), December 2010.
William Byrne. Hierarchical phrase-based translation with weighted finite state transducers .Natural Language Processing Group, Department of Computer Science, University of Sheffield, UK, December 2010.
William Byrne. Recent research in statistical machine translation. Winton Capital Management Internal Research Conference, November 2010. Invited presentation.
William Byrne. Hierarchical phrase-based translation with weighted finite state transducers. Dublin Computational Linguistics Research Seminar, Dublin, Ireland, November 2010.
Matthew Gibson and William Byrne. EMIME project overview. European Commission Information Society Conference (ICT 2010), Brussels, Belgium, September 2010.
A. de Gispert. Hierarchical Phrase-Based Translation with weighted finite state transducers. Talk at IST / INESC-id, Lisbon (Portugal), July 2010.
William Byrne and Adrià de Gispert. Fast Hiero grammars. DARPA GALE PI Meeting, Scottsdale, AZ, USA, April 2010.
William Byrne. Hierarchical phrase-based translation with weighted finite state transducers. Columbia University, New York, NY, USA, April 2010.
William Byrne. Hierarchical phrase-based translation with weighted finite state transducers. Google, Inc, Mountain View, CA, USA, April 2010.
William Byrne. FAUST project overview. ICT-FP7 Language Technology Days, Luxembourg, March 2010.

2010 Conference Papers

Adrià de Gispert, Juan Pino, William Byrne. Hierarchical phrase-based translation grammars extracted from alignment posterior probabilities. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Cambridge, MA, 2010.

Graeme Blackwood, Adrià de Gispert, William Byrne. Fluency Constraints for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices. Proceedings of the International Conference on Computational Linguistics (COLING) 2010

Juan Pino, Gonzalo Iglesias, Adrià de Gispert, Graeme Blackwood, Jamie Brunning and William Byrne. The CUED HiFST System for the WMT10 Translation Shared Task. ACL 2010 Joint Fifth Workshop on Statistical Machine Translation

Graeme Blackwood, Adrià de Gispert, William Byrne. Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices. Proc. Annual Meeting of the Association for Computational Linguistics (ACL) 2010

Mikko Kurimo, et al. Personalising speech-to-speech translation in the EMIME project. Proc. Annual Meeting of the Association for Computational Linguistics (ACL) 2010 (Demo session)

5 April 2010

Gonzalo Iglesias starts as an RA on the FAUST project

2010 Google Research Award

Weighted Finite State Transducers in Hierarchical Phrase-Based Translation
PI: Bill Byrne
Google contact: Michael Riley

WMT 2010 Shared Translation Tasks

Excellent performance in translation between Spanish, French, and English

New FP7 research project on interactive statistical machine translation

FAUST: Feedback Analysis for User Adaptive Statistical Translation (FP7-ICT-2009-4 STREP)
Project Fact Sheet

NIST 2009 Open Machine Translation Evaluation -- Top-ranked Arabic-to-English SMT system

Our system placed first in both the Single System Track and the System Combination Track

Quick Links: Home; Local Pages

Statistical Machine Translation

Cambridge University

Recent News

EACL'17 paper on NMT MBR decoding

Summarization and SMT , to appear in Computer Speech & Language

ACL 2016 paper on syntactic MT and NMT

WMT'16 English-German System Description

Rory Waite has successfully completed his PhD

Want to rescore translation lattices with bilingual neural networks ? Try:

HLT/NAACL 2015 paper on The Geometry of Statistical Machine Translation

Hierarchical Statistical Semantic Realization for Minimal Recursion Semantics

Juan Pino has successfully completed his PhD and is now at Facebook

Eva Hasler will join the Cambridge SMT group

New EPSRC project on SMT -- POSTDOCTORAL RESEARCH POSITIONS AVAILABLE

Open Source versions of the Cambridge SMT decoders have been released !

COLING 2014: Hiero with GHKM tree-to-string translation rules

Summer 2014

February 2014 - EACL'14 Student Research Workshop

December 2013 - Paper on Word Ordering Accepted to EACL 2014

December 2013 - Collaboration with Michael Riley and Cyril Allauzen of Google Research to appear in Computational Linguistics

October 2013 - Matic Horvat starts as a PhD student on SMT

2013-2014 Academic Visitors

August 2013 - WMT Presentation

July 2013 - International Conference on Finite-State Methods and Natural Language Processing -- FSMNLP 2013

June 2013 - Cognition Institute Summer School: Bilingual minds, bilingual machines

2013 Summer Students

May 2013 - New Russian-English SMT system

April 2013 FP7 FAUST Project Concludes Successfully -- faust-fp7.eu/faust/

2013-2014 Academic Visitors

Excellent results in Chinese and Arabic Translation in the 2012 NIST OpenMT Evaluation

September 2012 PBML Paper on using HFiles for fast MT model access

August 2012 MT Journal article on posteriors as translation confidence measures

July 2012 FSMNLP paper on links between LMERT and tropical polynomials

July 2012 Rory Waite is spending the summer in Los Angeles as an interning at SDL

June 2012 EAMT 2012 Best Paper Award !

April 2012

EAMT 2012 paper with Columbia University

March 2012 Marcus Tomalin seminar at University of Edinburgh

Yue Zhang appointed to Assistant Professor

February 2012 -- Article to appear in Speech Communication

26--27 January 2012 -- Short Course on Weighted Finite State Transducers in Statistical Machine Translation

20 January 2012 -- Seminar

Postdoctoral Research Opportunities in SMT

Marcus Tomalin joins the FAUST project

2011 Visitors

October 3--9 -- Brian Roark, Oregon Health & Science University (OHSU).

October 10--11 -- Markos Mylonakis, University of Amsterdam.

September 2011

2011 Summer Internships

July 2011 -- Paper to be presented at the Conference on Empirical Methods in Natural Language Processing (EMNLP'11) -- joint work with Google Research

19 July 2011 -- Bill Byrne discusses interactive machine translation on the BBC World Service Radio Programme 'Click'

June 2011 -- Cambridge Spanish-English interactive SMT systems are running on Reverso Labs

April 2011 -- Gonzalo Iglesias, EAMT Best Thesis Awardee 2010

2011 Talks and Presentations

September 2010 -- Paper published in Computational Linguistics

2010 Talks and Presentations

2010 Conference Papers

5 April 2010

2010 Google Research Award

WMT 2010 Shared Translation Tasks

New FP7 research project on interactive statistical machine translation

NIST 2009 Open Machine Translation Evaluation -- Top-ranked Arabic-to-English SMT system