Traduction d'unités poly lexicales du portugais en Français par MT@EC et etranslation

Other Title(s)

Translation of multi-word units from Portuguese into French by MT@EC and etranslation

Publication Date

2022-06-30

Country of Publication

Algeria

No. of Pages

Main Subjects

Languages
European languages

Topics

Abstract EN

This paper aims to determine the extent to which the shift from statistical machine translation (SMT) to neural machine translation (NMT) improved the performance of European Union machine translation systems between 2015 and 2021 in terms of multi-word unit translation and domain coverage.

to do so, we chose to test these systems on machine translation into French of multi-word units expressing quantitative and qualitative progression in Portuguese from Portugal.

these units consist of the 2-gram ‘cada vez’ and a comparative adjective or adverb (cada vez COMP), and their word-for-word translation into French is not idiomatic (*chaque fois COMP).

the most frequent translation into French is ‘de COMP en COMP’.

This implies that these multi-word units must be translated ‘en bloc’, but their identification is not straightforward.

on the one hand, COMP is not fixed and may include one (mais / plus, menos / moins, maior / plus grand, menor / plus petit, melhor / meilleur-mieux, pior / pire-plus mal) or several words (mais or menos N, ADJ, ADV).

on the other hand, the 2-gram ‘cada vez’ can be part of other multi-word units expressing iteration (de cada vez (que)/(à) chaque fois (que)), or ‘dropper’ ([a certain quantity] de cada vez / à la fois), This raises the challenge of ambiguity, well known to biotranslators and still often problematic for NMT.

moreover, units expressing quantitative or qualitative progression may raise other translation challenges when they are coordinate (with or without repetition of the 2-gram ‘cada vez’), when they are split (cada vez (…) COMP), or when they combine with verbs or nouns to form extended translation units whose translation into French can result in a more concise solution we refer to as ‘lexicalisation’.

we established a biotranslation model based on a manually aligned French-Portuguese parallel literary corpus and online searchable French-Portuguese aligned corpora (translation memories).

we selected a sample of occurrences of these multi-word units including several translation challenges.

these occurrences were selected from a Portuguese journalistic corpus.

they belong therefore to general language, whereas the EU's translation memories cover the domains dealt with by its institutions, which represents an additional challenge, considering the critical importance of domain coverage in the data to NMT performance quality.

the selected occurrences were translated into French by the EU SMT system in 2015 (MT@EC) and 2019 (eTranslation Legacy) and by eTranslation (the EU NMT system) in 2019 and 2021.

Firstly, MT output was analysed according to two general criteria: ‘non-literality’, that is translation into French without ‘chaque’, and acceptability from a semantic point of view, that is MT output without any false meaning, opposite meaning or nonsense.

then we looked at specific challenges, some of which could lead to original solutions, worthy of a professional human translator, such as lexicalisation, change of grammatical category or ‘recategorisation’ and ‘naturalisation’, that is phraseological or syntactic rearrangement that makes the target text more idiomatic.

the results show that MT is improving, especially according to the criterion of non-literality.

original solutions are still rare, but they are diversifying in NMT output.

nevertheless, NMT remains imperfect, not least because of the inherent ambiguity of natural languages and the inevitable gaps in the data on which these systems are based.

the results also demonstrate the importance of human intervention in the maintenance of the systems learning automatically, since the quality of SMT system’s output decreases between 2015 and 2019, when all efforts were focused on improving EU NMT system.

Finally, results reveal the dangers of using English as a pivot language when translating from one Romance language into another, and the need to train future translators in NMT and post-editing.

Abstract FRE

Cette étude vise à déterminer dans quelle mesure le passage de la traduction automatique statistique (TAS) à la traduction automatique neuronale (TAN) a amélioré les performances des systèmes de traduction automatique de l’union européenne entre 2015 et 2021 en termes de traduction d’unités polylexicales et de couverture de domaines.

pour ce faire, nous avons choisi de tester ces systèmes sur la ta en Français d’unités polylexicales exprimant la progression quantitative et qualitative en Portugais du Portugal.

nous avons établi un modèle de biotraduction à partir de corpus parallèles et alignés Français-Portugais et nous avons sélectionné un échantillon d’occurrences de ces unités polylexicales comportant plusieurs défis de traduction.

ces occurrences ont été prélevées sur un corpus journalistique Portugais et soumises aux systèmes de TAS et de TAN de l’ue en 2015, en 2019 et en 2021.

les résultats ont été analysés en fonction de deux critères généraux (non-littéralité et acceptabilité) et de défis particuliers pouvant donner lieu à des solutions originales, dignes d’un biotraducteur professionnel.

il en découle que la ta s’améliore, mais reste imparfaite, notamment en raison de l’ambiguïté inhérente aux langues naturelles et du caractère inéluctablement lacunaire des données sur lesquelles se fondent ces systèmes.

les résultats démontrent aussi l’importance de l’intervention humaine dans l’entretien de ces systèmes, les dangers de l’utilisation de l’anglais comme langue pivot lorsqu’il s’agit de traduire d’une langue romane à une autre et la nécessité d’initier les futurs traducteurs à la TAN et à la post-édition.

American Psychological Association (APA)

Bacquelaine, Francoise. 2022. Traduction d'unités poly lexicales du portugais en Français par MT@EC et etranslation. Revue traduction et langues،Vol. 21, no. 1, pp.56-76.
https://search.emarefa.net/detail/BIM-1442018

Modern Language Association (MLA)

Bacquelaine, Francoise. Traduction d'unités poly lexicales du portugais en Français par MT@EC et etranslation. Revue traduction et langues Vol. 21, no. 1 (Aug. 2022), pp.56-76.
https://search.emarefa.net/detail/BIM-1442018

American Medical Association (AMA)

Bacquelaine, Francoise. Traduction d'unités poly lexicales du portugais en Français par MT@EC et etranslation. Revue traduction et langues. 2022. Vol. 21, no. 1, pp.56-76.
https://search.emarefa.net/detail/BIM-1442018

Data Type

Journal Articles

Language

French

Notes

Includes bibliographical references: p. 74-76

Record ID

BIM-1442018

SaveSaved Print

Arab Citation & Impact Factor "Arcif"

Largest Arabic Database of Citations Analysis for the Arabic Scholarly Journals Issued in Arab World.

e-Marefa Platform for Arabic Textbook.

"Kashif" for Checking Similarity or Plagiarism in the Arabic Researches. know more