- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Demystifying RDMA Protocols for GPU Data Centers | NVlink, Connectx, EFA, Infiniband, GPUDirect
RDMA (Remote Direct Memory Access) is the secret sauce behind fast GPU clusters, which make training billion parameter LLMs feasible.
But once you go beyond a single vendor stack, the protocols, drivers, and libraries start to feel like a treasure hunt.
In this video, we explore how RDMA protocols really work for GPU-accelerated deep learning, and what it takes to design a generic RDMA library that can run across InfiniBand, RoCEv2, cloud fabrics like AWS EFA, and different NIC / GPU generations.
We’ll break down:
- NVLink vs RDMA (collective or peer-to-peer)
- The pain of p2p RDMA: Hidden assumptions baked into common libraries (NCCL, Connectx, DeepEP)
- Why building a “portable” RDMA abstraction is hard: memory registration, congestion control, reliability, ordering, and NIC quirks across vendors and clouds
Lessons inspired by engineering write-ups from Perplexity and others on scaling LLMs across thousands of GPUs with custom RDMA kernels and point-to-point data transfer.
🔍 Who is this for?
ML / DL engineers working on distributed training (NCCL, ConnectX, DeepSpeed, KV cache transfer, custom MoE stacks)
Infra / platform teams running GPU clusters, AI data centers, or cloud-hosted training environments
If you are trying to squeeze more performance out of your multi-node GPU training jobs
Demystify what RDMA libraries are doing under the hood
Видео Demystifying RDMA Protocols for GPU Data Centers | NVlink, Connectx, EFA, Infiniband, GPUDirect канала OffNote Labs
But once you go beyond a single vendor stack, the protocols, drivers, and libraries start to feel like a treasure hunt.
In this video, we explore how RDMA protocols really work for GPU-accelerated deep learning, and what it takes to design a generic RDMA library that can run across InfiniBand, RoCEv2, cloud fabrics like AWS EFA, and different NIC / GPU generations.
We’ll break down:
- NVLink vs RDMA (collective or peer-to-peer)
- The pain of p2p RDMA: Hidden assumptions baked into common libraries (NCCL, Connectx, DeepEP)
- Why building a “portable” RDMA abstraction is hard: memory registration, congestion control, reliability, ordering, and NIC quirks across vendors and clouds
Lessons inspired by engineering write-ups from Perplexity and others on scaling LLMs across thousands of GPUs with custom RDMA kernels and point-to-point data transfer.
🔍 Who is this for?
ML / DL engineers working on distributed training (NCCL, ConnectX, DeepSpeed, KV cache transfer, custom MoE stacks)
Infra / platform teams running GPU clusters, AI data centers, or cloud-hosted training environments
If you are trying to squeeze more performance out of your multi-node GPU training jobs
Demystify what RDMA libraries are doing under the hood
Видео Demystifying RDMA Protocols for GPU Data Centers | NVlink, Connectx, EFA, Infiniband, GPUDirect канала OffNote Labs
rdma rdma gpu gpudirect roce roce v2 infiniband aws efa google cloud rdma gpu clusters gpu networking distributed training distributed deep learning hpc networking high performance computing nccl mpi collective communication moe training large language models llm training ai infrastructure ai data center perplexity ai moe kvcache deepep connectx ordering rdma NICs GPU
Комментарии отсутствуют
Информация о видео
14 ноября 2025 г. 16:15:12
00:10:18
Другие видео канала




















