Dr Serge Sharoff

Dr Serge Sharoff


I joined the University of Leeds in 2003 after obtaining my PhD in 1997 from the Moscow Lomonosov State University and postdoctoral appointments at the Russian Research Institute for Artificial Intelligence (1997–2000), and Humboldt Research fellowship at the Univesity of Bielefeld (2001–2002).

Research interests

Artificial Intelligence and more specifically Large Language Models, such as ChatGPT, have recently made a profound impact on how we interact with the computers by providing the ability to produce new texts in response to prompts. Fundamental research in this area is at the core of my expertise, with one of my papers on the diversity of texts on the Web cited by the GPT creators.

My research interests are related to three domains: linguistics (primarily computational linguistics and corpus linguistics), cognitive science and communication studies.

Probably the most interesting bit in my recent research is digital curation of representative corpora automatically collected from the Web, i.e., their annotation in terms of genres, domains or morphosyntactic categories. The current set of resources includes very large corpora for Arabic, Chinese, English, French, German, Italian, Polish, Portuguese, Russian and Spanish. 

I am happy to consider applications from prospective PhD students in the area of my expertise. The following general topics are preferable:

Automatic Text Classification for Translation

Setting up a translation project usually involves assessing the amount of time required for translating a text and selecting the most suitable translator. Modern approaches in Language Technology can do wonders with text processing, but it is not clear how helpful they can be in the translation settings. For example, can they help to determine the genre of a text, its difficulty or suitability to translators? Similar text classification tools can be also used for tasks related to learning foreign languages.

Background references:

Language adaptation for improving models of lesser-resourced languages

A translation model needs to be applicable to a large number of languages, while the training resources or linguistic models are often better developed only for some languages. Language adaptation can be designed in a way similar to domain adaptation to improve the models of lesser-resourced languages by taking into account the resources available for closely related languages, e.g., from French to Romanian. This can be applied in a range of training scenarios, such as Part-Of-Speech tagging, text classification, translation quality prediction, etc.

Background references:

Non-parallel resources for translation

Modern Machine Translation is based on "plagiarising" large amounts of existing translations, which usually come from institutions such as the United Nations or the European Parliament. This is not enough for many language directions or for specific domains, such as biomedicine. What are productive methods to mine information about translations from non-parallel texts, such as Wikipedia articles on the same topic or news wire streams in different languages?

Background references:

<h4>Research projects</h4> <p>Any research projects I'm currently working on will be listed below. Our list of all <a href="https://ahc.leeds.ac.uk/dir/research-projects">research projects</a> allows you to view and search the full list of projects in the faculty.</p>

Student education

I teach courses on:

  • Computer-Assisted Translation Translation Memories, Terminology extraction and management, Machine Translation
  • Corpus methods for translators using corpus tools to solve translation problems
  • Introduction to Natural Language Processing using computers to model language

Research groups and institutes

  • Centre for Translation Studies
  • Language documentation
  • Translation
  • Leeds Russian Centre
  • Russian
  • Centre for Endangered Languages, Cultures and Ecosystems

Current postgraduate researchers

<h4>Postgraduate research opportunities</h4> <p>We welcome enquiries from motivated and qualified applicants from all around the world who are interested in PhD study. Our <a href="https://phd.leeds.ac.uk">research opportunities</a> allow you to search for projects and scholarships.</p>