[Comp Linguistics] Neural Machine Translation

Authored by Tony Feng

Created on Nov 3rd, 2022

Last Modified on Nov 10th, 2022

Intro

This sereis of posts contains a summary of materials and readings from the course CSCI 1460 Computational Linguistics that I’ve taken @ Brown University. The class aims to explore techniques regarding recent advances in NLP with deep learning. I posted these “Notes” (what I’ve learnt) for study and review only.


MT Evaluation

Human Evalution

  • Explicit ratings for fluency and faithfulness
  • Most reliable, but expensive to collect
  • New collections are needed for each system
  • Can’t be “hill climbed”

Automatic Evaluation

  • NLP prefers automatic eval for standardization and optimization.
  • Popular metrics for MT: BLEU, ESIM, BLEURT.
  • However, once the system is sufficiently good, metrics stop correlating with human judgements.

BLEU

$$ BLEU=BP \times \exp \left(\frac{1}{N} \sum_{n=1}^{N} \log p_{n}\right) $$

  • Assume we have an MT output (Candidate) and are comparing against multiple human-generated translations (Reference).
  • Intuition: We should reward models for producing translations that contains lots of the same words/phrases as the references.

$$ p_{n}=\frac{\sum_{c \in \text{cand}} \sum_{ngm \in c} {count_{clip}} (ngm)}{\sum_{c^{\prime} \in \text{cand}} \sum_{ngm^{\prime} \in c^{\prime}} \operatorname{count}\left(ngm^{\prime}\right)} $$

$$ BP = 1 \text{ if } c>r \text{ else } e^{(1-r) / c} \text { if } c \leq r $$

, where $BP$ is brevity precision and $p_n$ is weighted n-gram precision.


Neural MT

Encoder-Decoder Model

  • It refers to “sequence to sequence (seq2seq)”.
  • Intuition
    • “Conditional” text generation/language modeling.
    • The output is dependent on some input.
  • Examples
    • RNN Encoder-Decoder
    • Transformer Encoder-Decoder
  • Many other models are inspired by this structures
    • Encoder-Decoder: Original Transformer Model (Vaswani et al, 2017)
    • Encoder-only: BERT and variants (ALBERT, DistilBERT, RoBERTa)
    • Decoder-only (i.e., auto-regressive): GPT

Multilingual LM

Cross Lingual Transfer

  • Goal: train on one language but work in all languages
  • Intuition: if the model learns a good representation, it should be able to map the training it receives in one language to any other language.
  • Requirements: unlabeled, monolingual data
  • Better transfer for languages that are more typologivally similar and more syntactically similar.

Common Multilingual Models

  • mBERT
  • XLM-RoBERTa
  • mGPT
  • BLOOM

Reference


MIT License
Last updated on Nov 10, 2022 18:36 EST
Built with Hugo
Theme Stack designed by Jimmy