Bubble Milk Tea MT Project


Overview


This pilot project aims to estimate the work involved in training a neural machine translation (NMT) engine for Lee Kum Kee (LKK) recipes from Chinese into English. Post-editing machine translations (PEMT) should meet the criteria as follows

1) Efficiency
2) Reduce Cost
3) High Quality


Team Members

Our team members are Chih-Yuan Chen, Ruby Lee,  Hannah Liu, and myself. For this project, our team formed an imaginary company called Bubble Milk Tea. 

Project Specifications

For this project, our team used Microsoft Custom Translator. We were trying to train this neural machine translation (NMT) engine for Lee Kum Kee (LKK) recipes from Chinese into English. We did a total of 10 rounds of training.  Our goal is that post-editing machine translations (PEMT) should be 30% faster than human translation, 30% cheaper than HT and make the recipes simple, clear, and user-friendly. 


Proposal







Project Process


a) Data Collection & Alignment

Since our team can not get the original recipes, we have to manually collect it individually from the website (Chinese & English version) and convert it to a text file (txt). After obtaining the recipe we aligned the SL and TL into a TMX file for later training.

b) Microsoft Custom Translator, PEMT, and Human Evaluation

Our team conducted 10 rounds of training and for the first and the last round of training, we also did PEMT to test if there are improvements in the translation. Also, there were two surveys given to our friends to see if the recipes can be understood clearly and if they feel confident to make the dish after each PEMT. The survey functioned as a way of human evaluation to ensure the quality of the translated recipes.    

Deliverables

Updated proposal for training an NMT engine for LKK recipes, including:
1) Details of the data used in training
2) Details of each iteration and resulting BLEU score
3) Estimate of the feasibility and time/cost required to achieve the goals stated under       Objective
4) Suggestions for further training of the NMT engine

Updated Proposal


Lessons Learned

1) Invest time in making sure all the segments are fully aligned in .tmx format before training
2) Avoid using dictionaries in training rounds
3) (Client-side) Provide original editable files
4) Customize segmentation rules, add termbase for post-editing and find & flag Chinese       characters in the CAT tool

It is reasonable to expect a reliable NMT engine after it is fully trained. The weird expressions found in cooking instructions will be corrected to avoid confusion for making the dishes. It is expected that anyone who reads the cooking steps can complete the desired dish. Moreover, the use of terminology will be standardized to avoid ambiguity. As a result, anyone who reads the recipes should be able to prepare correct ingredients and seasonings and complete their desired dishes.


Final Video




留言

這個網誌中的熱門文章

Team CAT Translation Project

CCC Turkey Evaluation TMS Project

CAT Introduction (Lessons Learned)