Convolution and Recurrent Neural Networks -Image Captioning Project

Presented by: Maliheh Mahdavi Sefat
In this final live presentation, I explain the core concepts of Convolutional Neural Networks (CNNs), including convolution kernels, pooling, padding, and stride, as well as Recurrent Neural Networks (RNNs) and their hidden states.
I also discuss Long Short-Term Memory (LSTM) cells and their mechanism for preserving long-term dependencies in sequential data while mitigating vanishing and exploding gradients.
I then describe my solution for the image captioning task using the Flickr30k image-caption dataset. I employed two pretrained CNNs -- VGG16 and EfficientNetV2-- to extract image features.
On the captioning side, I applied preprocessing, tokenization, embedding, and LSTM layers to extract and model text features.
After achieving 66% training accuracy for the model with EfficienteNet features over 80 epochs, I evaluated the model using the BLEU metric, which measures the overlap between predicted and reference captions.
In this project, BLEU scores ranged from 0.42 for unigrams to 0.05 for 4-grams. I also incorporated visualizations to display sample test images alongside their actual and predicted captions.
Link to the project repository: https://github.com/mahdavis2024/CS-projects/tree/main/step12
#cs_internship #machine_learning #step12
Correction: In the first image, the yellow square is the kernel/filter and the green square is the input/image

Видео Convolution and Recurrent Neural Networks -Image Captioning Project канала CS Internship

Комментарии отсутствуют