Pytorch LSTM Part 7

Today we are going to take what we learned building our LSTM this week and tweak a few settings to see if we can lower the loss. Our model is tiny, about 19 lines, and most of the work is in data prep like building a dictionary, tags, and a mask token set to zero so we can ignore padding. I also want to test why view with 1 and negative 1 is not the same as unsqueeze, since they look similar but behave differently.

Then we will try the next step from the tutorial: add character level features so the model can use endings like ly that often signal an adverb. That means a second LSTM that reads characters, plus character embeddings, and then we combine that with the word level LSTM to predict part of speech tags per word, not per letter. We ran into shape issues, especially with batching and NLLLoss, and we also saw that character sequences are longer than word tag sequences, so we will need pooling or another way to collapse character outputs into one vector per word.

We started refactoring the tokenizer to work over characters and padding, but it is not finished yet, so the plan for tomorrow is to wire up the two embeddings, two LSTMs, and the pooling step, then get training stable and see if the loss improves.

Видео Pytorch LSTM Part 7 канала Stephen Blum

lstm nlp pytorch

Комментарии отсутствуют