Загрузка...

Probing LLM Fine-Tuning via Sparse Autoencoders

In this AI Research Roundup episode, Alex discusses the paper: 'A Mechanistic Investigation of Supervised Fine Tuning' This research investigates why Supervised Fine-Tuning significantly changes LLM behavior despite high cosine similarity in hidden activations. The authors introduce a diagnostic pipeline using pretrained Sparse Autoencoders to identify hidden representational shifts. Their analysis reveals that while raw activations appear similar, the underlying sparse latents diverge in task-specific and layer-specific ways. The study identifies precise semantic features that are systematically altered during the fine-tuning process. Additionally, the researchers discover a unique layer-wise update profile specifically associated with safety alignment. Paper URL: https://arxiv.org/pdf/2605.11426 #AI #MachineLearning #DeepLearning #LLM #SparseAutoencoders #FineTuning #Interpretability #SFT

Resources:
- GitHub: https://github.com/ruhzi/sae-investigation

Видео Probing LLM Fine-Tuning via Sparse Autoencoders канала AI Research Roundup
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять