Загрузка страницы

김동화 - Transformer & BERT

[잘못된 내용 수정사항]
- 1:39:24 warmup step 이전: warmup learning rate, 이후: learning rate
- 48:13의 Attended encoder output은 영어가 아닌 한글을 의미하며 각 행은 한글단어의 벡터를 의미하는 것으로 해석될 수 있을 것 같습니다

[발표자료]
https://drive.google.com/open?id=11sFhmiAsOL8WgNkqZighXgk7BPPqnieL

[참고한 코드]
Transformer: https://github.com/Kyubyong/transformer
BERT: https://github.com/google-research/bert

[추가 내용]
위 코드에 Transformer의 multi-head-attention의 linear transformation이 생략이 됬는데 논문에는 한번 더 projection이 있는것을 참고하시길 바랍니다.

Видео 김동화 - Transformer & BERT канала 고려대학교 산업경영공학부 DSBA 연구실
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
14 марта 2019 г. 13:54:58
02:01:20
Другие видео канала
[Paper Review] Towards better understanding of self supervised representations[Paper Review] Towards better understanding of self supervised representations[Paper Review] RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain..[Paper Review] RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain..[Paper Review] Various Methods to develop Verbalizer in Prompt-based Learning (KPT, WARP)[Paper Review] Various Methods to develop Verbalizer in Prompt-based Learning (KPT, WARP)[Paper Review] C2-CRS: Coarse-to-Fine Contrastive Learning for CRS[Paper Review] C2-CRS: Coarse-to-Fine Contrastive Learning for CRS[Paper Review] DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning[Paper Review] DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning[DSBA] CS224n 2021 Study | #10 Transformers and Pretraining[DSBA] CS224n 2021 Study | #10 Transformers and Pretraining[Paper Review] How Much Knowledge Can You Pack Into the Parameters of a Language Model?[Paper Review] How Much Knowledge Can You Pack Into the Parameters of a Language Model?[Paper Review] Speech to Speech Translation[Paper Review] Speech to Speech Translation[Paper Review]ON CONCEPT-BASED EXPLANATIONS IN DEEP NEURAL NETWORKS[Paper Review]ON CONCEPT-BASED EXPLANATIONS IN DEEP NEURAL NETWORKS[Paper Review] Open Source LMs[Paper Review] Open Source LMs[Paper Review] Masked Image Modeling[Paper Review] Masked Image Modeling[Paper Review] WinCLIP: Zero-/few-shot anomaly classification and segmentation.[Paper Review] WinCLIP: Zero-/few-shot anomaly classification and segmentation.[Paper Review] Towards Total Recall in Industrial Anomaly Detection[Paper Review] Towards Total Recall in Industrial Anomaly Detection[Paper Review] LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation[Paper Review] LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation[Paper Review] DeepTIMe: Deep Time-Index Meta-Learning for Non-Stationary Time-Series[Paper Review] DeepTIMe: Deep Time-Index Meta-Learning for Non-Stationary Time-Series[Paper Review] Non-Autoregressive Neural Machine Translation (Gu et al., ICLR 2018)[Paper Review] Non-Autoregressive Neural Machine Translation (Gu et al., ICLR 2018)[Paper Review] Asymmetric Student-Teacher Networks for Industrial Anomaly Detection[Paper Review] Asymmetric Student-Teacher Networks for Industrial Anomaly Detection[Paper Review] BEIT: BERT Pre-Training of Image Transformers[Paper Review] BEIT: BERT Pre-Training of Image Transformers[Paper Review] Momentum Contrast for Unsupervised Visual Representation Learning[Paper Review] Momentum Contrast for Unsupervised Visual Representation Learning[Paper Review] Community Detection in graphs[Paper Review] Community Detection in graphs[Paper Review] AER: Auto-Encoder with Regression for Time Series Anomaly Detection[Paper Review] AER: Auto-Encoder with Regression for Time Series Anomaly Detection
Яндекс.Метрика