Загрузка...

Fleet: Optimizing LLM Inference on Chiplet GPUs

In this AI Research Roundup episode, Alex discusses the paper: 'Fleet: Hierarchical Task-based Abstraction for Megakernels on Multi-Die GPUs' Fleet introduces a new multi-level task model designed specifically for modern chiplet-based GPUs. By mapping computation directly to memory scopes, it resolves the mismatch between flat programming models and hierarchical hardware. The core innovation is the Chiplet-task abstraction, which coordinates work through shared L2 caches to reduce redundant memory traffic. When tested on AMD Instinct MI350 hardware with Qwen3-8B, it significantly reduced decode latency compared to vLLM. This approach improves cache utilization and performance for memory-bound workloads like LLM inference. Paper URL: https://arxiv.org/pdf/2604.15379 #AI #MachineLearning #DeepLearning #LLMInference #GPUArchitecture #Chiplets #AMDInstinct #ParallelComputing

Видео Fleet: Optimizing LLM Inference on Chiplet GPUs канала AI Research Roundup

AMD Instinct Chiplets Computer Architecture Deep Learning Fleet GPU Architecture LLM Inference Machine Learning Megakernels Memory Optimization Multi-Die GPU Parallel Computing Research vLLM

Комментарии отсутствуют

Информация о видео

21 апреля 2026 г. 6:18:01

00:04:37

AI Research Roundup

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Поделиться

Другие видео канала

NGC: LLMs Learning to Manage Their Own KV Cache

W-RAC: Faster, Cheaper Chunking for RAG Systems

Scaling Test-Time Compute for Coding Agents

OpenGame: New Framework for Coding Playable Games

TEMPO: Scaling Test-time Training for LRMs

DR-Venus: Edge-Scale Research Agents on 10K Data

DELEGATE-52: Measuring LLM Document Corruption

LLaDA2.0-Uni: Unified Multimodal Diffusion LLM

SAW-INT4: 4-Bit KV-Cache Quantization for LLMs

VLA Foundry: Unified Vision-Language-Action Training

CoInteract: Realistic Human-Object Video Synthesis

NPO: Boosting LLM Reasoning via Near-Future Self

COS-PLAY: LLM Skill Discovery for Long Tasks

StyleID: Face Recognition for Stylized Portraits

GSI-Bench: Testing 3D Spatial Logic in MLLMs

WorldMark: Testing Interactive Video World Models

OpenMobile: Synthesis Framework for Mobile Agents

Sharpness Dimension: Why Chaotic Training Works

DeVI: Dexterous Hand Interaction via Video

Volt: SOTA 3D Segmentation with Vanilla Transformers

Все заметки Новая заметка Страницу в заметки

Страницу в закладки Мои закладки

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

О Cookies Напомнить позже Принять