close
close

TCP Device Memory Nears Finish Line to Create More Efficient Network Accelerators

LINUX NETWORKS

A year ago, Google engineers released experimental Linux code for Device Memory TCP to more efficiently transfer data from GPUs/accelerators to network devices without having to go through the host CPU’s memory buffer. After going through multiple rounds of review, Device Memory TCP seems to be approaching the finish line.

Device Memory TCP “Devmem TCP” is a Linux kernel feature that is being developed to allow efficient transfer of data to and/or from device memory without having to bounce the data through the host memory buffer. Due to the high memory/network bandwidth requirements, especially in the case of AI training with multiple connected systems and reliance on TPU/GPU/NPU/other types of acceleration devices, the goal was to avoid memory copies by the host system memory when sending or receiving data from these separate devices over the network.

TCP Device Memory Enables More Efficient Network Communication

Device Memory TCP introduces socket APIs to allow device memory to be sent directly over the network and incoming network packets to be received directly into device memory. This helps both to avoid pressure on host memory bandwidth and to reduce pressure on PCI Express bandwidth by not having to go through the PCIe root complex.

Device Memory TCP looks like it’s finishing up the preparatory work that was queued last week in the network subsystem “net-next.git” tree. So the preparatory work for Device Memory TCP won’t be coming until Linux 6.11 at least. There’s still a week or two left to see if the Device Memory TCP work itself will be queued for net-next before the v6.11 merge window, otherwise it looks like it’ll be coming in v6.12, which should be exciting considering it’s likely to be a 2024 LTS kernel release.