Lecture 12.3 Famous transformers (BERT, GPT-2, GPT-3)
ERRATA:
In the "original transformer" (slide 51), in the source attention, the key and value come from the encoder, and the query comes from the decoder.
In this lecture we look at the details of some famous transformer models. How were they trained, and what could they do after they were trained.
slides: https://dlvu.github.io/slides/dlvu.lecture12.pdf
course website: https://dlvu.github.io
Lecturer: Peter Bloem
Видео Lecture 12.3 Famous transformers (BERT, GPT-2, GPT-3) канала DLVU
In the "original transformer" (slide 51), in the source attention, the key and value come from the encoder, and the query comes from the decoder.
In this lecture we look at the details of some famous transformer models. How were they trained, and what could they do after they were trained.
slides: https://dlvu.github.io/slides/dlvu.lecture12.pdf
course website: https://dlvu.github.io
Lecturer: Peter Bloem
Видео Lecture 12.3 Famous transformers (BERT, GPT-2, GPT-3) канала DLVU
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
Lecture 6.3: variational autoencodersLecture 6.2: Latent variable modelLecture 1.3: AutoencodersLecture 1.2: Regression, classification and loss functionsLecture 1.1: Neural networksLecture 2.4: Automatic Differentiation (DLVU)Lecture 2.3: Backpropagation, a tensor view (DLVU)Lecture 2.2: Backpropagation, scalar perspective (DLVU)Lecture 1.1: Organization of the courseLecture 12.1 Self-attentionLecture 12.2 TransformersLecture 11.3: World ModelsLecture 11.2: Variance Reduction for Policy Gradient (Actor-Critic)Lecture 11.1: Deep Q-LearningLecture 10.2: ARM & FlowsLecture 10.3: ARM & FlowsLecture 10.1: ARM & FlowsLecture 9.2: The REINFORCE algorithmLecture 9.1: Introduction to Reinforcement LearningLecture 9.3: Gradient Estimation