Gemma 4 12B QAT vs non-QAT - 16GB VRAM Local LLM setup

In this video I am testing the QAT version of the Gemma 4 12B model from Google and comparing the quality of the QAT from Unsloth (which is q4) vs the regular q4 GGUF from Unsloth.

The model is running on a local AI PC I have built with 16GB VRAM and 32GB DDR4 RAM.

I run the model through a few tests which are:
1. Adherence
2. Agency
3. Coding
4. Memory

If you're interested in local LLMs, AI and homelabs from the perspective of a software engineer with many years of professional experience working with LLMs in production - feel free to subscribe!

Models -
• QAT: https://huggingface.co/unsloth/gemma-4-12B-it-qat-GGUF
• non-QAT: https://huggingface.co/unsloth/gemma-4-12b-it-GGUF
GitHub: https://github.com/lukesdevlab/youtube
Patreon: https://www.patreon.com/cw/LukesDevLab

#localllm #localai #homelab #llamacpp #homelab #gemma4 #quantization #qat

Chapters:
0:00 Coming up
0:08 Intro
0:55 Models
1:16 Tests
1:39 System Specs
1:50 Adherence - q4
2:53 Adherence - QAT
3:35 Agency
5:56 Coding - q4
7:55 Coding - QAT
10:55 Memory
12:40 Conclusion

Видео Gemma 4 12B QAT vs non-QAT - 16GB VRAM Local LLM setup канала Luke's Dev Lab

gemma4 gemma-4 qat 12b llm ai local

Комментарии отсутствуют