Bubble Milk Tea MT Project
Overview
This pilot project aims to estimate the work involved in training a neural machine translation (NMT) engine for Lee Kum Kee (LKK) recipes from Chinese into English. Post-editing machine translations (PEMT) should meet the criteria as follows
1) Efficiency
2) Reduce Cost
3) High Quality
Team Members
Our team members are Chih-Yuan Chen, Ruby Lee, Hannah Liu, and myself. For this project, our team formed an imaginary company called Bubble Milk Tea.
Project Specifications
For this project, our team used Microsoft Custom Translator. We were trying to train this neural machine translation (NMT) engine for Lee Kum Kee (LKK) recipes from Chinese into English. We did a total of 10 rounds of training. Our goal is that post-editing machine translations (PEMT) should be 30% faster than human translation, 30% cheaper than HT and make the recipes simple, clear, and user-friendly.
Proposal
a) Data Collection & Alignment
Since our team can not get the original recipes, we have to manually collect it individually from the website (Chinese & English version) and convert it to a text file (txt). After obtaining the recipe we aligned the SL and TL into a TMX file for later training.
b) Microsoft Custom Translator, PEMT, and Human Evaluation
Our team conducted 10 rounds of training and for the first and the last round of training, we also did PEMT to test if there are improvements in the translation. Also, there were two surveys given to our friends to see if the recipes can be understood clearly and if they feel confident to make the dish after each PEMT. The survey functioned as a way of human evaluation to ensure the quality of the translated recipes.
Deliverables
Updated proposal for training an NMT engine for LKK recipes, including:
1) Details of the data used in training
2) Details of each iteration and resulting BLEU score
3) Estimate of the feasibility and time/cost required to achieve the goals stated under Objective
4) Suggestions for further training of the NMT engine
Updated Proposal
Lessons Learned
1) Invest time in making sure all the segments are fully aligned in .tmx format before training
2) Avoid using dictionaries in training rounds
3) (Client-side) Provide original editable files
4) Customize segmentation rules, add termbase for post-editing and find & flag Chinese characters in the CAT tool
It is reasonable to expect a reliable NMT engine after it is fully trained. The weird expressions found in cooking instructions will be corrected to avoid confusion for making the dishes. It is expected that anyone who reads the cooking steps can complete the desired dish. Moreover, the use of terminology will be standardized to avoid ambiguity. As a result, anyone who reads the recipes should be able to prepare correct ingredients and seasonings and complete their desired dishes.
Final Video
留言
張貼留言