Self-supervised learningSupervisedLabeledSelf-supervisedNo label Bidirectional Encoder Representations from Transformers (BERT)Transformer encoder. Like a cloze test.Masking inputRandomly mask some tokens(mask or replace with a random token)Next sentence predictionFine-tuneTrain pre-trained BERT with new dataGeneral Language Understanding Evaluation (GLUE score)Fine-tune follow GLUE score to optimize BERTSentiment analysisNatural Language Inference (NLI)Part-of-speech tagging (POS)Extraction-based Question Answering (QA)Document: 𝐷={𝑑_1,𝑑_2,⋯,𝑑_𝑁 }Query: 𝑄={𝑞_1,𝑞_2,⋯,𝑞_𝑀 }Answer: 𝐴={𝑑_𝑠, ⋯,𝑑_𝑒 } Downstream taskLabel some data of the task we care.Multi-lingual BERTGenerative Pre-trained Transformer (GPT)Transformer decoder. Predict next token.In-text learningNo gradient descent. (few shot, one shot, zero shot)