Recent News
April 2012
Federico Flego joins the Delphi SMT project as an RAEAMT 2012 paper with Columbia University
- Can Automatic Post-Editing Make MT More Meaningful?
- Kristen Parton, Nizar Habash, Kathleen McKeown, Gonzalo Iglesias, Adrià de Gispert
- Automatic post-editors (APEs) enable the re-use of black box machine translation (MT) systems for a variety of tasks where different aspects of translation are important. In this paper, we describe APEs that target adequacy errors, a critical problem for tasks such as cross-lingual question-answering, and compare different approaches for post-editing: a rule-based system and a feedback approach that uses a computer in the loop to suggest improvements to the MT system. We test the APEs on two different MT systems and across two different genres. Human evaluation shows that the APEs significantly improve adequacy, regardless of approach, MT system or genre: 30-56% of the post-edited sentences have improved adequacy compared to the original MT.
March 2012 Marcus Tomalin seminar at University of Edinburgh
- Marcus Tomalin gave a seminar titled `In Search of `Natural' Speech: Grammaticality, Acceptability, and Speech Technology' at the Edinburgh Linguistics Circle, March 2012
- Although state-of-the-art large vocabulary Automatic Speech Recognition (ASR) and Statistical Machine Translation (SMT) systems often achieve impressive Word Error Rates (WERs) and BLEU scores respectively, end-users frequently consider the word sequences output by such systems to be `unnatural'. The perceived `unnaturalness' usually results from the accumulation of many small linguistic errors (e.g., lack of subject-verb agreement, partially scrambled syntax, homophonic substitution). Consequently, in recent years there has been a renewed interest in improving the `naturalness' of ASR and SMT output, even in systems that produce good WER and BLEU scores.
In this talk, the perceived `naturalness' of ASR and SMT transcriptions will be considered in the context of on-going debates about grammaticality and acceptability. An experimental framework for exploring these aspects of ASR/SMT transcriptions is described, and a methodology for improving the `naturalness' of such outputs is presented. The simplest ways of modifying an input word sequence are insertion, permutation, deletion, and substitution, and the approach adopted in this work makes use of a Combinatory Categorial Grammar (CCG) text generation system which enables input word sequences to be modified so as to improve their `naturalness'. It is shown that the output produced by the CCG-based system is considerably improved if the N-best generated hypotheses are rescored and reranked using Ngram-based techniques.
- Although state-of-the-art large vocabulary Automatic Speech Recognition (ASR) and Statistical Machine Translation (SMT) systems often achieve impressive Word Error Rates (WERs) and BLEU scores respectively, end-users frequently consider the word sequences output by such systems to be `unnatural'. The perceived `unnaturalness' usually results from the accumulation of many small linguistic errors (e.g., lack of subject-verb agreement, partially scrambled syntax, homophonic substitution). Consequently, in recent years there has been a renewed interest in improving the `naturalness' of ASR and SMT output, even in systems that produce good WER and BLEU scores.
Yue Zhang appointed to Assistant Professor
Dr. Yue Zhang will take up position as Assistant Professor at Singapore University of Technology and Design with effect from July 2012. Yue has been a Research Associate at the Cambridge Computer Laboratory working on parsing and natural language generation for MT as part of the FAUST project.February 2012 -- Article to appear in Speech Communication
- Impacts of machine translation and speech synthesis on speech-to-speech translation
Kei Hashimoto, Junichi Yamagishi, William Byrne, Simon King, Keiichi Tokuda
26--27 January 2012 -- Short Course on Weighted Finite State Transducers in Statistical Machine Translation
- Bill Byrne will give a six lecture course on Weighted Finite State Transducers in Statistical Machine Translation at the 2012 International Winter School in Language and Speech Technologies, Tarragona, Spain.
20 January 2012 -- Seminar
- W. Byrne. Hierarchical phrase-based translation representations. Workshop on More Structure for Better Statistical Machine Translation?, University of Amsterdam, Netherlands, January 2012.
Postdoctoral Research Opportunities in SMT
- Job Posting -- Research Associate in Statistical Machine Translation
- Closing date is 30 December 2011
Marcus Tomalin joins the FAUST project
- Marcus will work on shallow generation for fluency in MT
2011 Visitors
October 3--9 -- Brian Roark, Oregon Health & Science University (OHSU).
October 10--11 -- Markos Mylonakis, University of Amsterdam.
September 2011
- Graeme Blackwood (CUED PhD 2010) starts as Research Staff Member in the Machine Translation Group at the IBM T.J. Watson Research Center on the 3rd of October.
- Cambridge Spanish-English and French-English interactive SMT systems are running on Reverso Labs
Real-time SMT systems based on Cambridge's HiFST decoder.
Try our systems and the other FAUST MT systems at http://labs.reverso.net
2011 Summer Internships
- Juan Pino is in Mountain View, CA (USA) at Google, Inc. working on morphology in Russian MT
- Matt Shannon is in London at Google, Inc. working on HMM-based speech synthesis
July 2011 -- Paper to be presented at the Conference on Empirical Methods in Natural Language Processing (EMNLP'11) -- joint work with Google Research
Hierarchical Phrase-based Translation Representations.Gonzalo Iglesias, William Byrne, Adrià de Gispert, Department of Engineering, University of Cambridge
Cyril Allauzen, Michael Riley, Google Research
19 July 2011 -- Bill Byrne discusses interactive machine translation on the BBC World Service Radio Programme 'Click'
The FP7 FAUST project is featured in a discussion on openness on the internet Listen: BBC Programme website, with audio FAUST project website: http://faust-fp7.eu An extended version of the interview broadcast on Click on BBC World Service Radio, 19th July 2011 is available on the The Open University website.June 2011 -- Cambridge Spanish-English interactive SMT systems are running on Reverso Labs
Real-time SMT systems based on Cambridge's HiFST decoder.Try our systems and the other FAUST MT systems at http://labs.reverso.net
April 2011 -- Gonzalo Iglesias, EAMT Best Thesis Awardee 2010
From the EAMT website http://www.eamt.org/news/news_best_thesis_winner.php :- Dr. Gonzalo Iglesias has received a prize of €500 and has been granted a €200 bursary so that he can present a summary of his thesis at the Annual Conference of the EAMT (EAMT-2011) which will take place in Leuven, on May 30-31, 2011.
2011 Talks and Presentations
- A. de Gispert. Hierarchical Phrase-Based Translation at University of Cambridge. Talk at Barcelona Media Innovation Centre, Barcelona, Catalonia (Spain), July 2011.
- A. de Gispert. Hierarchical Phrase-Based Translation at University of Cambridge. Talk at Catalonia Research Group on Accessibility and Ambient Intelligence (CaiaC), Universitat Autònoma de Catalunya, Bellaterra, Catalonia (Spain), July 2011.
- A. de Gispert. Hierarchical Phrase-Based Representations: Decoding with Push-Down Transducers and Entropy-Pruned Language Models. Talk at DARPA GALE PI Meeting, Arlington, VA (USA), May 2011.
- A. de Gispert. Hierarchical Phrase-Based Translation at University of Cambridge. Talk at Google Research Labs, Mountain View, CA (USA), May 2011.
- G. Blackwood. Minimum Bayes-Risk Lattice Rescoring Methods for Statistical Machine Translation. Natural Language Processing Seminar, Computer Lab, University of Cambridge. May 2011.
- G. Blackwood. Lattice Rescoring Methods for Statistical Machine Translation. Talk at SRI, Menlo Park, CA (USA), April 2011.
- G. Iglesias. Hierarchical Phrase-based Translation with Weighted Finite State Transducers. Invited Presentation and Best Thesis Award for 2010. 15th Annual Conference of the European Association for Machine Translation, Leuven, Belgium, March 2011.
- G. Blackwood. Lattice Rescoring Methods for Statistical Machine Translation. Talk at IBM TJ Watson Research Labs, Yorktown Heights, NY (USA). February, 2011.
September 2010 -- Paper published in Computational Linguistics
Adrià de Gispert, Gonzalo Iglesias, Graeme Blackwood, Eduardo Banga, William Byrne. Hierarchical Phrase-based Translation with Weighted Finite State Transducers and Shallow-N Grammars. Computational Linguistics 36(3):505-533. September 2010 (PDF) (bib)2010 Talks and Presentations
- William Byrne. Hierarchical phrase-based translation with weighted finite state transducers. Keynote speech at IWSLT 2010, Paris (France), December 2010.
- William Byrne. Hierarchical phrase-based translation with weighted finite state transducers .Natural Language Processing Group, Department of Computer Science, University of Sheffield, UK, December 2010.
- William Byrne. Recent research in statistical machine translation. Winton Capital Management Internal Research Conference, November 2010. Invited presentation.
- William Byrne. Hierarchical phrase-based translation with weighted finite state transducers. Dublin Computational Linguistics Research Seminar, Dublin, Ireland, November 2010.
- Matthew Gibson and William Byrne. EMIME project overview. European Commission Information Society Conference (ICT 2010), Brussels, Belgium, September 2010.
- A. de Gispert. Hierarchical Phrase-Based Translation with weighted finite state transducers. Talk at IST / INESC-id, Lisbon (Portugal), July 2010.
- William Byrne and Adrià de Gispert. Fast Hiero grammars. DARPA GALE PI Meeting, Scottsdale, AZ, USA, April 2010.
- William Byrne. Hierarchical phrase-based translation with weighted finite state transducers. Columbia University, New York, NY, USA, April 2010.
- William Byrne. Hierarchical phrase-based translation with weighted finite state transducers. Google, Inc, Mountain View, CA, USA, April 2010.
- William Byrne. FAUST project overview. ICT-FP7 Language Technology Days, Luxembourg, March 2010.
2010 Conference Papers
Adrià de Gispert, Juan Pino, William Byrne. Hierarchical phrase-based translation grammars extracted from alignment posterior probabilities. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Cambridge, MA, 2010. Graeme Blackwood, Adrià de Gispert, William Byrne. Fluency Constraints for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices. Proceedings of the International Conference on Computational Linguistics (COLING) 2010 Juan Pino, Gonzalo Iglesias, Adrià de Gispert, Graeme Blackwood, Jamie Brunning and William Byrne. The CUED HiFST System for the WMT10 Translation Shared Task. ACL 2010 Joint Fifth Workshop on Statistical Machine Translation Graeme Blackwood, Adrià de Gispert, William Byrne. Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices. Proc. Annual Meeting of the Association for Computational Linguistics (ACL) 2010 Mikko Kurimo, et al. Personalising speech-to-speech translation in the EMIME project. Proc. Annual Meeting of the Association for Computational Linguistics (ACL) 2010 (Demo session)5 April 2010
Gonzalo Iglesias starts as an RA on the FAUST project2010 Google Research Award
Weighted Finite State Transducers in Hierarchical Phrase-Based TranslationPI: Bill Byrne
Google contact: Michael Riley
WMT 2010 Shared Translation Tasks
Excellent performance in translation between Spanish, French, and EnglishNew FP7 research project on interactive statistical machine translation
- FAUST: Feedback Analysis for User Adaptive Statistical Translation (FP7-ICT-2009-4 STREP)
- Project Fact Sheet
