- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Convolution and Recurrent Neural Networks -Image Captioning Project
Presented by: Maliheh Mahdavi Sefat
In this final live presentation, I explain the core concepts of Convolutional Neural Networks (CNNs), including convolution kernels, pooling, padding, and stride, as well as Recurrent Neural Networks (RNNs) and their hidden states.
I also discuss Long Short-Term Memory (LSTM) cells and their mechanism for preserving long-term dependencies in sequential data while mitigating vanishing and exploding gradients.
I then describe my solution for the image captioning task using the Flickr30k image-caption dataset. I employed two pretrained CNNs -- VGG16 and EfficientNetV2-- to extract image features.
On the captioning side, I applied preprocessing, tokenization, embedding, and LSTM layers to extract and model text features.
After achieving 66% training accuracy for the model with EfficienteNet features over 80 epochs, I evaluated the model using the BLEU metric, which measures the overlap between predicted and reference captions.
In this project, BLEU scores ranged from 0.42 for unigrams to 0.05 for 4-grams. I also incorporated visualizations to display sample test images alongside their actual and predicted captions.
Link to the project repository: https://github.com/mahdavis2024/CS-projects/tree/main/step12
#cs_internship #machine_learning #step12
Correction: In the first image, the yellow square is the kernel/filter and the green square is the input/image
Видео Convolution and Recurrent Neural Networks -Image Captioning Project канала CS Internship
In this final live presentation, I explain the core concepts of Convolutional Neural Networks (CNNs), including convolution kernels, pooling, padding, and stride, as well as Recurrent Neural Networks (RNNs) and their hidden states.
I also discuss Long Short-Term Memory (LSTM) cells and their mechanism for preserving long-term dependencies in sequential data while mitigating vanishing and exploding gradients.
I then describe my solution for the image captioning task using the Flickr30k image-caption dataset. I employed two pretrained CNNs -- VGG16 and EfficientNetV2-- to extract image features.
On the captioning side, I applied preprocessing, tokenization, embedding, and LSTM layers to extract and model text features.
After achieving 66% training accuracy for the model with EfficienteNet features over 80 epochs, I evaluated the model using the BLEU metric, which measures the overlap between predicted and reference captions.
In this project, BLEU scores ranged from 0.42 for unigrams to 0.05 for 4-grams. I also incorporated visualizations to display sample test images alongside their actual and predicted captions.
Link to the project repository: https://github.com/mahdavis2024/CS-projects/tree/main/step12
#cs_internship #machine_learning #step12
Correction: In the first image, the yellow square is the kernel/filter and the green square is the input/image
Видео Convolution and Recurrent Neural Networks -Image Captioning Project канала CS Internship
Комментарии отсутствуют
Информация о видео
6 октября 2025 г. 0:13:50
00:29:57
Другие видео канала