[Comp Linguistics] Neural Machine Translation

Authored by Tony Feng

Created on Nov 3rd, 2022

Last Modified on Nov 10th, 2022

Intro

This sereis of posts contains a summary of materials and readings from the course CSCI 1460 Computational Linguistics that I’ve taken @ Brown University. The class aims to explore techniques regarding recent advances in NLP with deep learning. I posted these “Notes” (what I’ve learnt) for study and review only.

MT Evaluation

Human Evalution

Explicit ratings for fluency and faithfulness
Most reliable, but expensive to collect
New collections are needed for each system
Can’t be “hill climbed”

Automatic Evaluation

NLP prefers automatic eval for standardization and optimization.
Popular metrics for MT: BLEU, ESIM, BLEURT.
However, once the system is sufficiently good, metrics stop correlating with human judgements.

BLEU

$$ BLEU=BP \times \exp \left(\frac{1}{N} \sum_{n=1}^{N} \log p_{n}\right) $$

Assume we have an MT output (Candidate) and are comparing against multiple human-generated translations (Reference).
Intuition: We should reward models for producing translations that contains lots of the same words/phrases as the references.

$$ p_{n}=\frac{\sum_{c \in \text{cand}} \sum_{ngm \in c} {count_{clip}} (ngm)}{\sum_{c^{\prime} \in \text{cand}} \sum_{ngm^{\prime} \in c^{\prime}} \operatorname{count}\left(ngm^{\prime}\right)} $$

$$ BP = 1 \text{ if } c>r \text{ else } e^{(1-r) / c} \text { if } c \leq r $$

, where $BP$ is brevity precision and $p_n$ is weighted n-gram precision.

Neural MT

Encoder-Decoder Model

It refers to “sequence to sequence (seq2seq)”.
Intuition
- “Conditional” text generation/language modeling.
- The output is dependent on some input.
Examples
- RNN Encoder-Decoder
- Transformer Encoder-Decoder
Many other models are inspired by this structures
- Encoder-Decoder: Original Transformer Model (Vaswani et al, 2017)
- Encoder-only: BERT and variants (ALBERT, DistilBERT, RoBERTa)
- Decoder-only (i.e., auto-regressive): GPT

Multilingual LM

Cross Lingual Transfer

Goal: train on one language but work in all languages
Intuition: if the model learns a good representation, it should be able to map the training it receives in one language to any other language.
Requirements: unlabeled, monolingual data
Better transfer for languages that are more typologivally similar and more syntactically similar.

Common Multilingual Models

mBERT
XLM-RoBERTa
mGPT
BLOOM