Compassion of LLM Assistants towards Sentient Beings

This project asks whether large language model assistants represent compassion in their internal activations, and whether they extend that compassion equally to humans and animals. The motivation is simple: AI assistants increasingly mediate decisions that touch on ethics, and we have surprisingly few tools to look inside them and check.

Building on recent interpretability work, we extracted two directions in a model's activation space. The assistant axis captures what makes a model behave like an assistant, computed as the difference in activations between the assistant persona and other personas. The compassion axis captures the contrast between compassionate and cold behavior. We constructed separate compassion axes for human-directed and animal-directed compassion, then measured how each aligned with the assistant axis using cosine similarity.

We tested four open-weights models spanning two families and a range of parameter scales: Qwen 3 4B, Qwen 3 32B, Gemma 2 27B, and Gemma 4 31B. The compassion axis aligns with the assistant axis at roughly 20 to 30 percent across models, suggesting compassion is a measurable component of assistant behavior rather than incidental. Early results on speciesism, the difference in alignment between human-directed and animal-directed compassion, show interesting variation across models, including at least one notable reversal between model generations within the same family. We are still validating these findings and extending the analysis to additional models and persona sets.

The broader goal is a mechanistic framework for surfacing how AI assistants represent compassion toward different sentient beings, and a foundation for shaping it deliberately.

For more about the work, visit:
https://shubham.is/compassion-axis

———

Presented by Shubham Gupta
Mentored by Jasmine Brazilek

Sentient Futures Project Incubator Showcase Spring 2026

Видео Compassion of LLM Assistants towards Sentient Beings канала Sentient Futures

Комментарии отсутствуют

Информация о видео

Вчера, 11:26:48

00:04:08

Sentient Futures

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Другие видео канала

Compassion of LLM Assistants towards Sentient Beings

Building an AI Macro Culture for the Movement | Thomas Manandhar-Richardson | AIADM London 2025

Harnessing AI to Shape Animal-Friendly Urban Spaces | Allen Lu, Alisha Vavilakolanu | AIADM NYC 2025

Advancing Farm Animal Welfare with AI | Dr. Suresh Neethirajan | Dalhousie University

Ethics, Animals, and AI Dicussion and Q&A | Peter Singer | Princeton University

Towards Ambient Ethology | Emilia Tapprest | NVISIBLE.STUDIO

Activity Tracking at Scale | Jean-Sebastien Spratt | Prophet AI

Including Non-human Welfare in AI Alignment | Adrià Moret | AIADM NYC 2025

Different Types of Artificial Minds: Digital, Analog, and Hybrid | Chris Percy | AIADM London 2025

Mapping Aquaculture Expansion: Species Diversity & Animal Welfare | Chiawen Chiang | AIADM NYC 2025

Speaker Series: Katie Zacarian, CEO of Earth Species Project

Insights from Systems Biology and SETI | Dr. Dante Lauretta | University of Arizona

What the Animal Movement Gets Wrong About AI | Lewis Bollard

Implications of Illusionism on Distribution of Consciousness | José Curbera-Luis | AIADM London 2025

AI Creation and the Cosmic Host | Nick Bostrom | AIADM London 2025

Breakthroughs in Bioacoustics | Dr. Sara Keen | Earth Species Project

Who Matters More, a Bat, a Bee, a Bot? | Prof. Jeff Sebo | NYU Center for Mind, Ethics, and Policy

AI for Fishes | Shreya Padukone | Animal Law Centre at NALSAR University of Law

Envisioning a video game incubator that nurtures compassion for animals

Power-Seeking Theorems and AI Welfare | Bob Fischer | AIADM NYC 2025

Status Quo Treatment of AI Moral Patients as a Catastrophic Risk | Bradford Saad | AIADM London 2025

A Theory of Change for Animal and AI Welfare | Jeff Sebo | AIADM London 2025