Ratnamogan, M. Pirashanth (2018) On using monolingual corpora and ensembling in machine translation PFE - Project Graduation, ENSTA.
![]() | PDF Restricted to Registered users only 3295Kb |
Abstract
For one year now, the BNP Paribas Analytics Consulting have made available an internal translation tool that makes it possible to translate without having to share clients data outside. The tool is based on stateof-the-art supervised algorithms that are based on millions of translation examples. That kind of data is rare and the team explores new alternative methods. My internship was mainly focused on unsupervised neural machine translation where one can build a translator without any translation example. Then, as a complement, I applied some semi-supervised methods based on reinforcement learning. Finally, I explored ensemble methods to combine the information given by various models and I built a pipeline that allows one to the extraction of parallel and monolingual data from the Common Crawl. This internship resulted in various extensions based on the literature that can be used within the team fully supervised pipeline. But I also explored several ideas to improve the methods I’ve used.
Item Type: | Thesis (PFE - Project Graduation) |
---|---|
Subjects: | Mathematics and Applications |
ID Code: | 7174 |
Deposited By: | Pirashanth Ratnamogan |
Deposited On: | 27 mars 2019 14:57 |
Dernière modification: | 27 mars 2019 14:57 |
Repository Staff Only: item control page