DriveNets Expands AI Networking Fabric to Support Multi-Site Distributed GPU Clusters
DriveNets upgrades its Network Cloud-AI platform with multi-tenancy and multi-site support for GPU clusters distributed across distances up to 80km, addressing power limitations through a cell-based fabric architecture.
Figure 1. DriveNets Boosts AI Networking Fabric with Multi-Site Support for Distributed GPU Clusters.
AI is becoming a crucial driver for high-performance networks, but connecting GPUs poses unique challenges compared to CPUs. Figure 1 shows DriveNets Boosts AI Networking Fabric with Multi-Site Support for Distributed GPU Clusters.
Among the networking vendors tackling these AI networking hurdles is DriveNets. Founded in 2015, the company has steadily gained momentum over the past decade, earning major customers such as AT&T, Comcast, Telefonica, and Orange.
Beyond its service provider routing technology, DriveNets has developed an Ethernet-based AI networking fabric called Network Cloud-AI, designed specifically for hyperscalers, neoclouds, and enterprises building GPU clusters.
This week, DriveNets announced major upgrades to its Network Cloud-AI platform, adding multi-tenancy and multi-site capabilities. These enhancements allow GPU clusters to span geographic distances of up to 80 kilometers, effectively overcoming power limitations that are increasingly constraining AI deployments.
“Due to electricity concerns and the demands of large data centers, many customers are exploring the idea of building a single GPU cluster distributed across multiple locations,” explained Inbar Lasser-Raab, Chief Marketing Officer at DriveNets. “Our distributed switch architecture on white boxes enables us to balance deep and shallow buffers, supporting high performance across the entire cluster—even when it spans multiple sites.”
Cell-Based Fabric Architecture Powers Performance
DriveNets’ networking architecture fundamentally differs from traditional data center networks. Instead of relying on standard Clos Ethernet designs, it uses a distributed fabric based on a cell-based protocol.
“We use a familiar physical setup with top-of-rack, leaf, and spine switches,” said Dudy Cohen, Vice President of Product Marketing at DriveNets, in an interview with Network World. “However, the communication between the top-of-rack switch—connecting NICs to servers—and the rest of the network is not based on Clos Ethernet. Instead, it employs a unique cell-based protocol, the same one used in chassis backplanes.”
Cohen detailed that incoming data packets are sliced into evenly sized cells at the ingress switch, distributed evenly across the fabric, then reassembled at the destination. This contrasts with other solutions that require specialized endpoint components, like Nvidia BlueField DPUs.
“Our fabric links between top-of-rack and spine are perfectly load-balanced,” Cohen said. “We avoid hashing mechanisms, which lets us manage congestion entirely within the fabric without external help.”
Multi-Site Implementation for Distributed GPU Clusters
This multi-site capability enables organizations to bypass power limitations of individual data centers by distributing GPU clusters across multiple geographic locations.
This multi-site design isn’t intended as a backup or failover system. Inbar Lasser-Raab emphasized that it functions as a single, unified cluster spanning two locations up to 80 kilometers apart, allowing connections to separate power grids for improved resilience.
Physically, the implementation relies on high-bandwidth links between sites. Dudy Cohen explained that these connections typically use dark fiber or DWDM (Dense Wavelength Division Multiplexing) fiber optic technology. Usually, the link consists of four bundled 800 Gbps Ethernet channels, combining to deliver a 3.2 Tbps connection.
Enhanced Multi-Tenancy for AI Workloads
For service providers offering GPU-as-a-service and enterprises running multiple AI workloads on Kubernetes, DriveNets has improved traffic isolation features. Kubernetes is widely adopted as the cloud-native platform for managing AI workloads.
“If you run Kubernetes, you often have multiple workloads and tenants sharing the same cluster,” Cohen said. “Maintaining quality of service for each workload or tenant is critical, especially to prevent noisy neighbors from impacting others.”
Thanks to its cell-based fabric, DriveNets ensures strong isolation so that no noisy tenant or workload can affect others, even when they run in different Kubernetes containers on the same infrastructure.
AI-Powered Operations and Products
While DriveNets builds networks tailored for AI workloads, it also integrates AI capabilities within its own products and operations. “We often joke that we do both networking for AI and AI for networking,” Cohen said.
On the product side, DriveNets is embedding AI into network management and orchestration. By training AI models on massive volumes of network logs, the company has developed an assisted root cause analysis feature that helps identify dependencies and issues more efficiently.
Source: NETWORK WORLD
Cite this article:
Priyadharshini S (2025), DriveNets Expands AI Networking Fabric to Support Multi-Site Distributed GPU Clusters, AnaTechMaz, pp. 164

