Self-supervised learning

  • Supervised

    Labeled

  • Self-supervised

    No label

Bidirectional Encoder Representations from Transformers (BERT)
Transformer encoder. Like a cloze test.
  • Masking input

    Randomly mask some tokens(mask or replace with a random token)

  • Next sentence prediction
  • Fine-tune
    Train pre-trained BERT with new data

    General Language Understanding Evaluation (GLUE score)

    Fine-tune follow GLUE score to optimize BERT

    1. Sentiment analysis
    1. Natural Language Inference (NLI)
    1. Part-of-speech tagging (POS)
    1. Extraction-based Question Answering (QA)

      Document: 𝐷={𝑑_1,𝑑_2,⋯,𝑑_𝑁 }

      Query: 𝑄={𝑞_1,𝑞_2,⋯,𝑞_𝑀 }

      Answer: 𝐴={𝑑_𝑠, ⋯,𝑑_𝑒 }

  • Downstream task
    Label some data of the task we care.
  • Multi-lingual BERT
Generative Pre-trained Transformer (GPT)
Transformer decoder. Predict next token.
  • In-text learning

    No gradient descent. (few shot, one shot, zero shot)