Burthier, Mr Quentin (2021) Machine Translation of User-Generated Content PFE - Project Graduation, ENSTA.

[img]
Preview
PDF
1001Kb

Abstract

Machine Translation (MT), the task of automatically translating a text, produces for some language pairs high quality translations of texts written in a conventional form. However, many texts produced by internet users on forums or social media websites – called User-Generated Content (UGC) – are not written in this conventional form of language, and machine translation systems fail to translate them correctly. In this work, we give a description of UGC and present the state of the art of the task of making machine translation systems more robust to UGC. We provide experiments highlighting the shortcomings of state-of-the-art MT systems when translating UGC. We then empirically show that segmenting the text into characters increases the robustness to UGC of the MT systems we tested, compared to segmenting into units containing multiple characters.

Item Type:Thesis (PFE - Project Graduation)
Subjects:Mathematics and Applications
ID Code:8353
Deposited By:Quentin Burthier
Deposited On:26 janv. 2021 16:36
Dernière modification:26 janv. 2021 16:36

Repository Staff Only: item control page