Professor Bogdan Babych

Professor Bogdan Babych


I am a Professor of Translation Studies at Heidelberg University, Germany and a Visiting Research Fellow within the School of Languages, Cultures and Societies at the University of Leeds.

I work in the area of Computational Linguistics and Natural Language Processing and have published papers on evaluating and improving the quality of Machine Translation (MT) with Information Extraction techniques, extracting translation equivalents from large non-parallel corpora, developing MT for under-resourced languages, hybrid MT, on computational models for Ukrainian and Russian morphosyntax, machine translation to and from Slavonic languages, using Machine Translation for supporting language learning and authoring in non-native language. I previously worked as a computational linguist at L&H Speech Products, Belgium. I hold a PhD in Machine Translation from the University of Leeds, and a Candidate of Sciences degree in Ukrainian Linguistics from Ukrainian National Academy of Sciences. I coordinated an FP7 Marie Curie project HyghTra on developing a new hybrid MT architecture. I previously worked in other EU and UK projects: ASSIST (Automated semantic assistance for translators), ACCURAT (enhancing MT using comparable corpora for under-resourced languages), TTC (mining translation terminology from comparable corpora), IntelliText (Intelligent Tools for Creating and Analysing Electronic Text Corpora for Humanities Research). In 2007 I won a Leverhulme Early Career Research Fellowship for my project “Translation Strategies in Comparable Corpora”.

My personal webpage at the School of Computing - Natural Language Processing group:

My webpage on GitHub is

Research projects

(For details see Projects section on my personal webpage)

EU FP7 Marie Curie IAPP project HyghTra (2010-2014)
Project: Hybrid high-quality translation system
Role: Coordinator

EU FP7 ICT Project ACCURAT (2010-2012)
Project: Analysis and evaluation of Comparable Corpora for Under Resourced Areas of machine Translation
Role: Principal Investigator for Leeds team

Leverhulme Early Career Research Fellowship (2007-2009)
Project: Translation Strategies in Comparable Corpora
Role: Principal Investigator

Research and industrial collaboration

Detailed list of publications and CV
Industrial collaboration
Links and personal information

Research Student Supervision

I am interested in supervising PhD and MA by research students in a range of areas and topics, which include:

Machine Translation and Computer-Assisted Translation

Evaluation of Machine Translation
Linguistic models for Machine Translation
MT in the workflow of professional translators and translation companies
Improving MT quality with Computational Linguistics (CL) technologies
Emerging CL technologies for Computer-Assisted Translation and Interpreting
Collaborative translation workflow

Computational and Corpus Linguistics

Multiword expressions and phraseology
Computational models of discourse
Computational aspects of Slavonic languages
Computational models of morphosyntax
Corpus linguistics
Computational complexity of langauge
Tree Adjoining Grammars and other mildly context-sensitive formalisms
Corpus-based translation studies
Computational Linguistics methods for research in humanities

Slavonic Languages and General Linguistics

Ukrainian linguistics
Morphosyntax of Slavonic Languages
Linguistic constructions and their formal models

Please send an email, or your proposal with CV to
(Programming skills and/or knowledge of statistical packages are an advantage)

Citations, indices and videolectures

Google Scholar citations
ACM citations
Citeseer citations
MT Archive publications (with PDFs)
Dblp index
Humbox profile
Video of a talk at ACL 2007, Prague: Assisting Translators in indirect lexical transfer

Selected publications

Full list available on the Publications webpage:

Babych, Bogdan (2019, August). Unsupervised Induction of Ukrainian Morphological Paradigms for the New Lexicon: Extending Coverage for Named Entities and Neologisms Using Inflection Tables and Unannotated Corpora. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing at ACL-2019 (pp. 1-11). (pdf)

Bogdan Babych, Fangzhong Su, Anthony Hartley, Ahmet Aker, Monica Lestari Paramita, Paul Clough, Robert Gaizauskas (2019). Cross-Language Comparability and its applications for MT. In: Using Comparable Corpora for Under-Resourced Areas of Machine Translation. An account of the results from the project ACCURAT and beyond. Springer.- 44 pp.

Reinhard Rapp, Vivian Xu, Tatiana Gornostay, Olga Vodopiyanova, Andrejs Vasijevs, Klaus-Dirk Schmitz, Michael Zock, Serge Sharroff, Richard Forsyth, Bogdan Babych (2019). New areas of application of Comparable Corpora In: Using Comparable Corpora for Under-Resourced Areas of Machine Translation. An account of the results from the project ACCURAT and beyond. Springer. - 32 pp.

Babych, B. (2017). Unsupervised induction of morphological lexicon for Ukrainian. In: Proc. of CAMRL2017: Workshop on Computational Approaches to Morphologically Rich Languages. Leeds. 5 July 2017

Yu Yuan, Bogdan Babych and Serge Sharoff. (2017) Reference-free System for Automated Human Translation Quality Estimation. In.: Proc. of 12th Iberian Conference on Information Systems and Technologies (CISTI), 21-24 June 2017

Babych, B. (2017). Deconstruction of the Russian propaganda discourse in military history: Identifying and neutralizing linguistic means of falsifying history of the Ukrainian division “Halychyna”. In: Proc. of 2nd International forum on crisis communications: “Information stream models as tactical instruments of communication content security”. Military Institute of National Trarash Shevchenko University, Kyiv, 22-23 May 2017, pp.: 80-89

Babych, B. (2016). Graphonological Levenshtein Edit Distance: Application for automated cognate identification. Baltic Journal of Modern Computing. Vol.4 (2016), No.2 (EAMT-2016 volume), 115-128 [pdf] (preprint)

Babych, B., Sharoff, S. (2016). Rapid induction of morphological disambiguation resources from a closely related language. Fifth Workshop on Hybrid Approaches to Translation (HyTra-5) [pdf]

Babych, B. (2016) A hybrid machine translation system between English, Ukrainian, Arabic and Russian for automated terrorist activity detection. In: Proc. of XII International Conference "Military education and science: the present and the future" Military Institute of Taras Shevchenko National University, 25 November 2016, Kyiv, Ukraine.

Yuan, Y., Sharov, S. and Babych, B. (2016). MoBiL: A hybrid feature set for Automatic Human Translation quality assessment. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Ljubljana, Slovenia.

Babych, B. (2016). Nuclear weapons of ideological warfare: Ukraine's revolution and war poetry (2013- ) as translation of the totalitarian discourse. Sociology of poetry translation conference, June 2016, University of Leeds.

Babych, B. (2016). Deconstruction of the totalitarian discourse as a factor of country’s information security: Ukrainian post-2013 literature of Revolution and war as resistance to Russia’s hybrid aggression in the cultural information space. Proc. of International forum on crisis communications, Military Institute of National Taras Shevchenko University, Kyiv, Ukraine, 9-10 June 2016, pp.: 153-157

Babych, B and Atwell, E (2015). Multilingual Information Extraction framework for real-time detection of terrorist propaganda threats in on-line communication. In: Proc. of XI International Conference "Military education and science: the present and the future" Military Institute of Taras Shevchenko National University, 27 November 2015, Kyiv, Ukraine. Abrstact (en) [pdf]; Abstract (uk) [pdf]; Powerpoint (en) [ppt]; Powerpoint (uk) [ppt]

(2014) Bogdan Babych, Jonathan Geiger, Mireia Ginest Rosell, Kurt Eberle. Deriving de/het gender classification for Dutch nouns for rule-based MT generation tasks. In Proc of EACL 2014 Third Workshop on Hybrid Approaches to Translation (HyTra).

(2012) Bogdan Babych, Anthony Hartley, Kyo Kageura, Martin Thomas, & Masao Utiyama: MNH-TT: a collaborative platform for translator training. [Aslib 2012] Translating and the Computer 34, 29-30 November 2012, One Birdcage Walk, London, UK; 18pp. [PDF, 1710KB]; presentation by Martin Thomas: 41 slides [PDF, 8772KB]

(2012) Kurt Eberle, Bogdan Babych, Johanna Gei, Mireia Ginest-Rosell, Anthony Hartley, Reinhard Rapp, Serge Sharoff, & Martin Thomas: Design of a hybrid high quality machine translation system. EACL Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra): Proceedings of the workshop, 23-24 April 2012, Avignon, France; pp.101-112. [PDF, 381KB]

(2012) Mrcis Pinnis, Radu Ion, Dan tefnescu, Fangzhong Su, Inguna Skadia, Andrejs Vasijevs, & Bogdan Babych: ACCURAT toolkit for multi-level alignment and information extraction from comparable corpora. [ACL 2012] Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, 10 July 2012, System Demonstrations; pp.91-96. [PDF , 235KB]

(2012) Reinhard Rapp, Serge Sharoff, & Bogdan Babych: Identifying word translations from comparable documents without a seed lexicon. EACL Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra): Proceedings of the workshop, 23-24 April 2012, Avignon, France; pp.10-19. [PDF, 188KB]

(2010) Jo Drugan & Bogdan Babych: Shared resources, shared values? Ethical implications of sharing translation resources. JEC 2010: Second joint EM+/CNGL Workshop “ Bringing MT to the user: research on integrating MT in the translation industry”, AMTA 2010, Denver , Colorado , November 4, 2010; pp.3-9. [PDF, 9,433KB]

(2009) Bogdan Babych, Anthony Hartley, & Serge Sharoff: Evaluation-guided pre-editing of source text: improving MT-tractability of light verb constructions.LREC 2008: 6th Language Resources and Evaluation Conference, Marrakech , Morocco , 26-30 May 2008; 4pp. [PDF, 57KB]

(2007) Bogdan Babych, Anthony Hartley, & Serge Sharoff: A dynamic dictionary for discovering indirect translation equivalents.Translating and the Computer 29. Proceedings of the twenty-ninth international conference on Translating and the Computer, 29-30 November 2007 ( London : Aslib, 2007); 10pp. [PDF, 150KB]

(2007) Bogdan Babych, Anthony Hartley, & Serge Sharoff: Translating from under-resourced languages: comparing direct transfer against pivot translation. MT Summit XI, 10-14 September 2007, Copenhagen , Denmark . Proceedings; pp.29-35 [PDF, 197KB]

(2007) Bogdan Babych & Anthony Hartley: Sensitivity of automated models for MT evaluation: proximity-based vs. performance-based methods. MT Summit XI Workshop: Automatic procedures in MT evaluation, 11 September 2007, Copenhagen , Denmark , [Proceedings]; 22pp. [PDF of PPT presentation, 150KB]

(2007) Bogdan Babych, Anthony Hartley, Serge Sharoff, & Olga Mudraya: Assisting translators in indirect lexical transfer.ACL 2007: proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, June 2007; pp. 136-143 [PDF, 285KB]

(2006) Serge Sharoff, Bogdan Babych, & Anthony Hartley: Using comparable corpora to solve problems difficult for human translators. Coling-ACL 2006: Proceedings of the Coling/ACL 2006 Main Conference Poster Sessions, Sydney, July 2006; pp.739-746. [PDF, 250KB]

(2006) Serge Sharoff, Bogdan Babych, & Anthony Hartley: Using collocations from comparable corpora to find translation equivalents. LREC-2006: Fifth International Conference on Language Resources and Evaluation. Proceedings, Genoa , Italy , 22-28 May 2006; pp.465-470 [PDF, 1104KB]

(2006) Serge Sharoff, Bogdan Babych, Paul Rayson, Olga Mudraya, & Scott Piao: ASSIST: automated semantic assistance for translators. EACL-2006: 11th Conference of the European Chapter of the Association for Computational Linguistics, Posters and demonstrations, Trento, Italy, April 5-6, 2006; pp.139-142 [PDF, 69KB]

(2005) Bogdan Babych, Anthony Hartley and Debbie Elliott: Estimating the predictive power of n-gram MT evaluation metrics across language and text types . MT Summit X, Phuket, Thailand, September 13-15, 2005, Conference Proceedings: the tenth Machine Translation Summit; pp.412-418. [PDF, 180KB]

(2005) Bogdan Babych: Information extraction technology in machine translation: IE methods for improving and evaluating MT quality. Ph D thesis, University of Leeds , Centre for Translation Studies, March 2005. 186pp. [PDF, 859KB]

(2004) Bogdan Babych, Debbie Elliott, and Anthony Hartley: Extending MT evaluation tools with translation complexity metrics.Coling 2004: 20th International Conference on Computational Linguistics, 23-27 August 2004, University of Geneva , Switzerland , Proceedings; 7pp. [PDF, 68KB]

(2004) Bogdan Babych & Anthony Hartley: Extending the BLEU MT evaluation method with frequency weightings. ACL 2004: 42nd annual meeting of the Association for Computational Linguistics: Proceedings of the conference, 21-26 July 2004, Barcelona , Spain ; pp. 621-628. [PDF, 132KB]

(2004) Bogdan Babych, Debbie Elliott, & Anthony Hartley: Calibrating resource-light automatic MT evaluation: a cheap approach to ranking MT systems by the usability of their output. LREC-2004: Fourth International Conference on Language Resources and Evaluation, Proceedings, Lisbon , Portugal , 26-28 May 2004; pp.2031-2034. [PDF, 237KB]

(2004) Bogdan Babych & Anthony Hartley: Modelling legitimate translation variation for automatic evaluation of MT quality. LREC-2004: Fourth International Conference on Language Resources and Evaluation, Proceedings, Lisbon , Portugal , 26-28 May 2004; pp.833-836. [PDF, 283KB]

Research interests

Machine Translation; computational models for morphosyntax; Slavonic languages; Ukrainian.

<h4>Research projects</h4> <p>Any research projects I'm currently working on will be listed below. Our list of all <a href="">research projects</a> allows you to view and search the full list of projects in the faculty.</p>

Student education

I teach courses in Translation Technologies and translation theory, as well as Continuous Professional Development workshops for professional translators and international organizations.

Research groups and institutes

  • Centre for Translation Studies
  • Centre for Translation Studies
  • Linguistics and Translation
  • Translation
  • Language processing
  • Language at Leeds

Current postgraduate researchers

<h4>Postgraduate research opportunities</h4> <p>We welcome enquiries from motivated and qualified applicants from all around the world who are interested in PhD study. Our <a href="">research opportunities</a> allow you to search for projects and scholarships.</p>