RDMA with GPU Memory via DMA-Buf
Jianxin Xiong, Intel, Corp.
Jianxin Xiong is a Software Engineer at Intel. Over the past 15+ years he has worked on various layers of interconnection software stack, such as RDMA drivers in Linux kernel, RDMA device virtualization, Open Fabric Interface, DAPL, Tag Matching Interface, and Intel MPI. His current focus is GPU/accelerator scale-out with RDMA devices.
Discrete GPUs have been widely used in systems for high performance data parallel computations. Scale-out configuration of such systems often include RDMA capable NICs to provide high bandwidth, low latency inter-node communication. Over the PCIe bus, the GPU appears as peer device of the NIC and extra steps are needed to set up GPU memory for RDMA operations. Proprietary solutions such as Peer-Direct from Mellanox have existed for a while for this purpose. However, direct use of GPU memory in RDMA operations (A.K.A. GPU Direct RDMA) is still unsupported by upstream RDMA drivers. Dma-buf is a standard mechanism in Linux kernel for sharing buffers for DMA access across different device drivers and subsystems. In this talk, a prototype is presented that utilizes dma-buf to enable peer-to-peer DMA between the NIC and GPU memory. The required changes in the kernel RDMA driver, user space RDMA core libraries, as well as Open Fabric Interface library (libfabric) are discussed in detail. The goal is to provide a non-proprietary approach to enable direct RDMA to/from GPU memory.
Видео RDMA with GPU Memory via DMA-Buf канала insideHPC Report
Jianxin Xiong is a Software Engineer at Intel. Over the past 15+ years he has worked on various layers of interconnection software stack, such as RDMA drivers in Linux kernel, RDMA device virtualization, Open Fabric Interface, DAPL, Tag Matching Interface, and Intel MPI. His current focus is GPU/accelerator scale-out with RDMA devices.
Discrete GPUs have been widely used in systems for high performance data parallel computations. Scale-out configuration of such systems often include RDMA capable NICs to provide high bandwidth, low latency inter-node communication. Over the PCIe bus, the GPU appears as peer device of the NIC and extra steps are needed to set up GPU memory for RDMA operations. Proprietary solutions such as Peer-Direct from Mellanox have existed for a while for this purpose. However, direct use of GPU memory in RDMA operations (A.K.A. GPU Direct RDMA) is still unsupported by upstream RDMA drivers. Dma-buf is a standard mechanism in Linux kernel for sharing buffers for DMA access across different device drivers and subsystems. In this talk, a prototype is presented that utilizes dma-buf to enable peer-to-peer DMA between the NIC and GPU memory. The required changes in the kernel RDMA driver, user space RDMA core libraries, as well as Open Fabric Interface library (libfabric) are discussed in detail. The goal is to provide a non-proprietary approach to enable direct RDMA to/from GPU memory.
Видео RDMA with GPU Memory via DMA-Buf канала insideHPC Report
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
Enabling Applications to Exploit SmartNICs and FPGAsHigh-Performance MPI Library with SR-IOV and SLURM for Virtualized InfiniBand ClustersiPad Configured for Remote HPCNEC Accelerates HPC with Vector Computing at ISC 2018Cray Sonexion Storage Takes Lustre to Infinite ScaleDAOS: Scale-Out Software-Defined Storage for HPC/Big Data/AI ConvergencePerlmutter: a 2020 Pre-Exascale GPU-Accelerated System for NERSCOptics for the Cloud – A New Approach to Data Centre TechnologyManaging Genomics Data with DDN at the Sanger InstitutePerformance of a Task-Parallel PGAS Programming Model using OpenSHMEM and UCXClusterStor 1500 Storage Appliance for Big DataPanel Discussion: The Convergence of AI and HPCHow DMTF and Redfish Ease System AdministrationHow AI is Reshaping HPCArchitecting Flash for Scale and Performance in HPCManaging HPC Software Complexity with SpackIPOIB AccelerationE4-ARKA: ARM64+GPU+IB is Now HereKx Streaming Analytics Demo Easily Crunches 1.2 Billion NYC Taxi Data points using Intel Xeon PhiNEC Steps up with SX-Aurora Vector Engine for HPCAn Update on CXL Specification Advancements