- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Compassion of LLM Assistants towards Sentient Beings
This project asks whether large language model assistants represent compassion in their internal activations, and whether they extend that compassion equally to humans and animals. The motivation is simple: AI assistants increasingly mediate decisions that touch on ethics, and we have surprisingly few tools to look inside them and check.
Building on recent interpretability work, we extracted two directions in a model's activation space. The assistant axis captures what makes a model behave like an assistant, computed as the difference in activations between the assistant persona and other personas. The compassion axis captures the contrast between compassionate and cold behavior. We constructed separate compassion axes for human-directed and animal-directed compassion, then measured how each aligned with the assistant axis using cosine similarity.
We tested four open-weights models spanning two families and a range of parameter scales: Qwen 3 4B, Qwen 3 32B, Gemma 2 27B, and Gemma 4 31B. The compassion axis aligns with the assistant axis at roughly 20 to 30 percent across models, suggesting compassion is a measurable component of assistant behavior rather than incidental. Early results on speciesism, the difference in alignment between human-directed and animal-directed compassion, show interesting variation across models, including at least one notable reversal between model generations within the same family. We are still validating these findings and extending the analysis to additional models and persona sets.
The broader goal is a mechanistic framework for surfacing how AI assistants represent compassion toward different sentient beings, and a foundation for shaping it deliberately.
For more about the work, visit:
https://shubham.is/compassion-axis
———
Presented by Shubham Gupta
Mentored by Jasmine Brazilek
Sentient Futures Project Incubator Showcase Spring 2026
Видео Compassion of LLM Assistants towards Sentient Beings канала Sentient Futures
Building on recent interpretability work, we extracted two directions in a model's activation space. The assistant axis captures what makes a model behave like an assistant, computed as the difference in activations between the assistant persona and other personas. The compassion axis captures the contrast between compassionate and cold behavior. We constructed separate compassion axes for human-directed and animal-directed compassion, then measured how each aligned with the assistant axis using cosine similarity.
We tested four open-weights models spanning two families and a range of parameter scales: Qwen 3 4B, Qwen 3 32B, Gemma 2 27B, and Gemma 4 31B. The compassion axis aligns with the assistant axis at roughly 20 to 30 percent across models, suggesting compassion is a measurable component of assistant behavior rather than incidental. Early results on speciesism, the difference in alignment between human-directed and animal-directed compassion, show interesting variation across models, including at least one notable reversal between model generations within the same family. We are still validating these findings and extending the analysis to additional models and persona sets.
The broader goal is a mechanistic framework for surfacing how AI assistants represent compassion toward different sentient beings, and a foundation for shaping it deliberately.
For more about the work, visit:
https://shubham.is/compassion-axis
———
Presented by Shubham Gupta
Mentored by Jasmine Brazilek
Sentient Futures Project Incubator Showcase Spring 2026
Видео Compassion of LLM Assistants towards Sentient Beings канала Sentient Futures
Комментарии отсутствуют
Информация о видео
Вчера, 11:26:48
00:04:08
Другие видео канала





















