The French National Research Agency Projects for science

Voir cette page en français

ANR funded project

Interactions des mondes physiques, de l'humain et du monde numérique (DS0707) 2014
Projet KEHATH

Advanced quality methods for post-edition of machine translation

The translation community has seen a major change over the last five years: machine translation has become good enough so that it has become advantageous for translators to post-edit it rather than translate from scratch. This is due to recent progress in statistical machine translation, that is, the training of a translation engine with a corpus of existing translations. Current enhancement of machine translation (MT) systems from human post-edition (PE) of raw outputs are somewhat efficient yet rather basic: the post-edited output is added to the training corpus and the translation model and language model are re-trained, with no clear view of how much has been improved and how much is left to be improved. In this approach, only the final PE result is used, no other user feedback on the raw MT quality is provided, such as the cognitive processes of the post-editor or the logging of the post-edition actions he has performed. The KEHATH project intends to address these issues in two ways:
Firstly, leverage advanced machine learning (ML) techniques in the MT+PE loop. Our goal is to boost the impact of PE, that is, reach the same performance with less PE or better performance with the same amount of PE. In other words, we want to improve machine translation learning curves. For this purpose, active learning and reinforcement learning techniques will be proposed and evaluated. In the industrial context of KAHATH, we will have to face challenges such as MT systems heterogeneity (statistical and/or rule-based), and ML algorithms scalability to improve a domain-specific MT.
Secondly, quality prediction (QP) on MT outputs is crucial for translation project managers. We have developped over the years a number of confidence estimation and error detection techniques in the laboratory and we will implement and evaluate them in real-world conditions. A shared concern will be to work on continuous domain-specific data flows to improve both MT and the performance of indicators for quality prediction.
The overall goal of the KEHATH project is straightforward: gain additional machine translation performance as fast as possible in each and every new industrial translation project, so that post-edition time and cost is drastically reduced. Basic research is the best way to reach this goal, for an industrial impact that is powerful and immediate.

Partners

LIG Laboratoire d'Informatique de Grenoble

L&M Lingua et Machina

LIFL UNIVERSITE LILLE I

ANR grant: 498 844 euros
Beginning and duration: octobre 2014 - 42 mois

 

ANR Programme: Interactions des mondes physiques, de l'humain et du monde numérique (DS0707) 2014

Project ID: ANR-14-CE24-0016

Project coordinator:
Monsieur François Brown De Colstoun (Lingua et Machina)

 

Back to the previous page

 

The project coordinator is the author of this abstract and is therefore responsible for the content of the summary. The ANR disclaims all responsibility in connection with its content.