Sharada Yeluri’s Post

Sr. Director of Engineering, Silicon and Systems Technology, @ Juniper Networks

4mo Edited

I finally watched a few videos from this year's OCP conference. As expected, the networking sessions focused on improving performance, congestion, and resiliency for large GPU clusters. Meta and ByteDance presented Disaggregated Scheduled Fabric (DSF) as a cure-all solution. It's interesting how Meta is trying out DSF for their next-gen clusters after two generations of Ethernet fabrics with hop-by-hop scheduling. Scheduled fabric isn't new; it's in all high-end modular switches. It's receiver-based scheduling between leaf switches in a leaf-spine topology. The sender requests credits for transmission. Once approved, data is fragmented into smaller cells, spread across fabric links, and reassembled by the destination leaf switch for delivery to the target GPU. DSF offers hyperscalers several advantages: It enables them to treat the distributed chassis as a large pizza box/single router, shifting the burden of network utilization/performance to switch vendors. Resilience is often built into the hardware, enabling link switchovers without software intervention. DSF eliminates the extensive tuning required for ECN and other schemes. With DSF, packets are buffered in ingress leaf switches while waiting for grants, and packet buffering requirements increase linearly with scale. Egress leaf must have sufficient output buffering to hide RTT (which depends on cable lengths) to avoid under-subscribed links. With limited on-chip SRAM buffering in ingress switches, traffic could spill to external memory, increasing tail latencies despite preventing packet loss. ByteDance claims all traffic can remain on-chip in their 128 GPU DSF results, but this may not hold for larger systems. Meta's DSF results at OCP with a 144 GPU cluster show a 10% improvement for bandwidth-intensive collectives like all-to-all. For other collectives, including all-reduce, the results are at par with the existing adaptive routing (with DLB) configuration used in Meta's other ethernet fabrics. ByteDance saw similar results in their clusters. This raises the question of ROI for the expensive DSF fabrics. While the 10% advantage is significant, would that hold good for large clusters? We will have to wait for production network results. DSF adds control plane complexity. All DSF implementations are vendor-specific, requiring customers to understand specific implementations. DSF, in a sense, is a closed vendor solution like Infiniband. UEC also enables packet spray with a transport protocol that doesn't require strict ordering. It specifies a combined sender and receiver-based (similar to VOQ in DSF) scheduling from NICs. With AMD's announcement, UEC-compliant NICs (with programmable pipes) are on the horizon. Time will tell if DSF fizzles out in AI clusters when more economical UEC-compliant switches/NICs flood the market (wishful thinking😊) or continue to coexist with hyperscalers wanting turnkey solutions. Any thoughts? 🤔 #networkingforAI #AI #GPU #Junipernetworks

43 Comments

Sharada Yeluri

Sr. Director of Engineering, Silicon and Systems Technology, @ Juniper Networks

4mo

Thanks, Aibing Zhou, for giving more insights on UEC congestion/incast control. The UEC NIC end-to-end congestion control is a combined sender and receiver-based algorithm. The sender component deals with cases when the core of the Fabric is oversubscribed (either designed so or as a result of certain fault links in the core somewhere). This, combined with no strict ordering requirements, should result in a fabric that is as good as DSF when it comes to handling utilization and in-cast.

1 Reaction

Chris Whyte

Principal Network Solutions Architect at Marvell Semiconductor

4mo

I have so many issues with DSFs but let's first just focus on your statement that claims DSF offers hyperscalers several advantages: "It enables them to treat the distributed chassis as a large pizza box/single router, shifting the burden of network utilization/performance to switch vendors." From my experience, the one thing I've never heard a hyperscaler say is that they would like to shift the burden of doing some set of functions (away from themselves) and give it to switch vendors. Keep in mind, you're stating it as a "burden" when in fact it's not a burden but a function that allows the hyperscaler to have more control over the e2e solution. Certain switch vendors like DSFs because it gives them the control by adding more unnecessary complexity within the fabric. More complexity gives them opportunity to sell more SW and increase the CoGs of the underlying HW (e.g., require deep buffers). In addition, it encourages them to build proprietary solutions that lock their customers into a single vendor, which is clearly evident in today's solutions thus far.

10 Reactions

Yossi Kikozashvili

Head of Product, AI Infrastructure

4mo

Hi :) Reasons scheduled fabric gains so much traction for GPU cluster deployments are pretty simple. First, its really easy to use - non like typical ethernet, Infiniband, or enhanced ethernet (NIC based scheduling), no tuning is needed, it simpley works. Regardless of the NIC you use, regardless of whether you invested days on tuning buffers and DCQCN tresholds. So ops simplicity is one thing (Btw, thats one reason there’s huge interest coming from enterprises that are low on skillset and engineering force to tune buffers every time they deploy a model). Second is the performance. Its simply better. ByteDance, at their OCP presentation reported 37% improvement to All-to-All intensive workloads and 20% improvement to all-reduce ones. Thats offcourse compared to traditional ethernet with hashing. Compared to packet spraying capable solutions, the advantage of Scheduled fabric is decreased, but very far from being “on par”. I had the luck & joy of working with ByteDance on this specific project so i know :) Lastly, its field proven- one fact some folks might not know is- AT&T’s next-gen core network is built with these scheduled fabrics, replacing legacy chassis with a leaf and spine, resilient architecture.

2 Reactions

Jeff Tantsura

Distinguished @Nvidia, building the best network and technologies to connect GPUs

4mo

100.000 GPUs, in production ;-)

13 Reactions

Mike Reznikov

Network|Math|ML Acceleration (C/C++/RTL)

4mo

Sharada Yeluri I dont understand the numbers. Why are there so few GPUs (144) when testing large scale functions? Thank you for the post.

David Williams

Senior Network Engineer

4mo

Love your posts. You always express complexity in a way that is consumable for those of us who have not tackled the kind of problems that are usually only handled by vendors or Goliath corporations. I have wanted to dig into the comparisons you make in this post. What jumped out at me was that “DSF, in a sense is a closed source solution.” Are there detailed documents available for control plane solutions with DSF or is that all intellectual property that is not available to the public?

Rami Rahim

3mo

Super insightful analysis as always Sharada. As you know we (Juniper) have so much first hand experience (the good and the bad) in DSFs with past projects. Can’t wait to see how this all plays out and yet another reason why it’s an exciting time to be a network technologist today!

4 Reactions

Weiqiang Cheng

Research Institute of China Mobile - Technical Manager

4mo

This’s an interesting topic! Last May, China Mobile initiated the establishment of a similar organization called GSE (Global Scheduling Ethernet). We have released a DSF that is similar to Broadcom’s but is entirely based on an open Ethernet standard. Unlike Broadcom’s DSF, GSE operates with Ethernet packets between the spine and leaf, rather than cells, and the VoQ is dynamically established, so there’s no need to worry about excessive buffer usage. We have developed switches based on this standard and tested them in clusters with around one thousand of GPUs. The all-to-all efficiency performance is comparable to ByteDance’s test results, but during model training, it shows even greater performance improvements, reaching over 20% compared to traditional networks. Additionally, the overall cost seems to have an advantage; the combination of scheduled switches and simple NICs can be more cost-effective than using simple switches with smart NICs.

4 Reactions

Dennis Qin

AI/HPC infrastructure product manager at Supermicro

4mo

How about step back a little bit, with two end points, connected back to back with Ethernet, say 10G, so the maximum you can get is 10G line rate, if you don't see throughput at 10G, then it's the problem with endpoint application software stack. if you want to share this 10G Ethernet link with many apps and constantly make it running at 10G line rate, then you need to work with app stacks together to achieve this. Different apps may need different tweaks to make that happen, how we make Ethernet work with all the apps to achieve this? we developed TCP/IP on top of Ethernet but then we found it doesn't work well AI/LLM type of workloads, then we picked up IB, or DPU with TCP/IP, etc. to achieve this. My 2 cents. Dennis Qin

Nitin Kumar

Innovative Leader Driving Revenue Across Key Market Segments | Expert in Routing Solutions & Customer Engagement | Quality Champion & Diversity Advocate | Mentor & Coach

4mo

Thanks for the synopsis, Sharada. As usual, the battle between standards compliant solutions (which are usually slower and despite best efforts take time to settle down) and vendor-specific solutions (agility being the advantage here, so long they can strongly differentiate and showcase value when compared to standards based solutions) will continue till either (1) the standards based solutions are so good that incremental improvements on vendor-specific solutions are not enough to warrant any consideration, or (2) vendor-specific solutions continue to clearly out-innovate the standards based solution and continue keeping the lead, Both will continue to persist, maybe for different usecases. Any DC deployments where different vendor leafs & spine switches are used, standards based solution, as you pointed out, is the only choice. But i believe there will be enough vendor specific POD deployments (entire pod built with leaves/spines from same vendor) to justify the DSF like vendor specific solutions. The push to commoditize any high-volume and high-cost devices/NPUs/NICs to continuously reduce costs will ultimately win for sure.

3 Reactions

See more comments

To view or add a comment, sign in

More Relevant Posts

Johnson Liu
3mo
Report this post
This is exactly my question, how much performance improvement DSF vs BF3 adaptive routing, vs UEC compliant NIC with packet spray, and how DSF can perform in large cluster with more buffer required on both egress and ingress side?
Sharada Yeluri

Sr. Director of Engineering, Silicon and Systems Technology, @ Juniper Networks
4mo Edited

I finally watched a few videos from this year's OCP conference. As expected, the networking sessions focused on improving performance, congestion, and resiliency for large GPU clusters. Meta and ByteDance presented Disaggregated Scheduled Fabric (DSF) as a cure-all solution. It's interesting how Meta is trying out DSF for their next-gen clusters after two generations of Ethernet fabrics with hop-by-hop scheduling. Scheduled fabric isn't new; it's in all high-end modular switches. It's receiver-based scheduling between leaf switches in a leaf-spine topology. The sender requests credits for transmission. Once approved, data is fragmented into smaller cells, spread across fabric links, and reassembled by the destination leaf switch for delivery to the target GPU. DSF offers hyperscalers several advantages: It enables them to treat the distributed chassis as a large pizza box/single router, shifting the burden of network utilization/performance to switch vendors. Resilience is often built into the hardware, enabling link switchovers without software intervention. DSF eliminates the extensive tuning required for ECN and other schemes. With DSF, packets are buffered in ingress leaf switches while waiting for grants, and packet buffering requirements increase linearly with scale. Egress leaf must have sufficient output buffering to hide RTT (which depends on cable lengths) to avoid under-subscribed links. With limited on-chip SRAM buffering in ingress switches, traffic could spill to external memory, increasing tail latencies despite preventing packet loss. ByteDance claims all traffic can remain on-chip in their 128 GPU DSF results, but this may not hold for larger systems. Meta's DSF results at OCP with a 144 GPU cluster show a 10% improvement for bandwidth-intensive collectives like all-to-all. For other collectives, including all-reduce, the results are at par with the existing adaptive routing (with DLB) configuration used in Meta's other ethernet fabrics. ByteDance saw similar results in their clusters. This raises the question of ROI for the expensive DSF fabrics. While the 10% advantage is significant, would that hold good for large clusters? We will have to wait for production network results. DSF adds control plane complexity. All DSF implementations are vendor-specific, requiring customers to understand specific implementations. DSF, in a sense, is a closed vendor solution like Infiniband. UEC also enables packet spray with a transport protocol that doesn't require strict ordering. It specifies a combined sender and receiver-based (similar to VOQ in DSF) scheduling from NICs. With AMD's announcement, UEC-compliant NICs (with programmable pipes) are on the horizon. Time will tell if DSF fizzles out in AI clusters when more economical UEC-compliant switches/NICs flood the market (wishful thinking😊) or continue to coexist with hyperscalers wanting turnkey solutions. Any thoughts? 🤔 #networkingforAI #AI #GPU #Junipernetworks
Like Comment
To view or add a comment, sign in
Yanzhao Zhang

Building Hyperscale Infrastructure with SONiC Empowering Microsoft Azure Cloud & LLM AI
3mo
Report this post
Really like this article with deep insight on DSF
Sharada Yeluri

Sr. Director of Engineering, Silicon and Systems Technology, @ Juniper Networks
4mo Edited

I finally watched a few videos from this year's OCP conference. As expected, the networking sessions focused on improving performance, congestion, and resiliency for large GPU clusters. Meta and ByteDance presented Disaggregated Scheduled Fabric (DSF) as a cure-all solution. It's interesting how Meta is trying out DSF for their next-gen clusters after two generations of Ethernet fabrics with hop-by-hop scheduling. Scheduled fabric isn't new; it's in all high-end modular switches. It's receiver-based scheduling between leaf switches in a leaf-spine topology. The sender requests credits for transmission. Once approved, data is fragmented into smaller cells, spread across fabric links, and reassembled by the destination leaf switch for delivery to the target GPU. DSF offers hyperscalers several advantages: It enables them to treat the distributed chassis as a large pizza box/single router, shifting the burden of network utilization/performance to switch vendors. Resilience is often built into the hardware, enabling link switchovers without software intervention. DSF eliminates the extensive tuning required for ECN and other schemes. With DSF, packets are buffered in ingress leaf switches while waiting for grants, and packet buffering requirements increase linearly with scale. Egress leaf must have sufficient output buffering to hide RTT (which depends on cable lengths) to avoid under-subscribed links. With limited on-chip SRAM buffering in ingress switches, traffic could spill to external memory, increasing tail latencies despite preventing packet loss. ByteDance claims all traffic can remain on-chip in their 128 GPU DSF results, but this may not hold for larger systems. Meta's DSF results at OCP with a 144 GPU cluster show a 10% improvement for bandwidth-intensive collectives like all-to-all. For other collectives, including all-reduce, the results are at par with the existing adaptive routing (with DLB) configuration used in Meta's other ethernet fabrics. ByteDance saw similar results in their clusters. This raises the question of ROI for the expensive DSF fabrics. While the 10% advantage is significant, would that hold good for large clusters? We will have to wait for production network results. DSF adds control plane complexity. All DSF implementations are vendor-specific, requiring customers to understand specific implementations. DSF, in a sense, is a closed vendor solution like Infiniband. UEC also enables packet spray with a transport protocol that doesn't require strict ordering. It specifies a combined sender and receiver-based (similar to VOQ in DSF) scheduling from NICs. With AMD's announcement, UEC-compliant NICs (with programmable pipes) are on the horizon. Time will tell if DSF fizzles out in AI clusters when more economical UEC-compliant switches/NICs flood the market (wishful thinking😊) or continue to coexist with hyperscalers wanting turnkey solutions. Any thoughts? 🤔 #networkingforAI #AI #GPU #Junipernetworks
Like Comment
To view or add a comment, sign in
AI Chips News

97 followers
1mo
Report this post
Vultr Collaborates with AMD, Broadcom and Juniper Networks to Pioneer New GPU Data Center Architecture Vultr AMD J.J. Kardwell Negin Oliver #AIChipsNews #AI #Chips #Semiconductor #AIChip #AIChipsMarket https://lnkd.in/dHWE_DsK

Vultr Collaborates with AMD, Broadcom and Juniper Networks to Pioneer New GPU Data Center Architecture - AI Chips News

https://aichipsnews.com
Like Comment
To view or add a comment, sign in
Rob Freeman
2mo
Report this post
“Open ecosystems are the foundation of innovation”...I couldn't agree more! AI DCs are evolving fast. New accelerated processors are available with the software needed to drive them. Don't get locked in to any vendor or single-sourced technology by building on a foundation of open Ethernet. Here's a great example, get in touch to learn how it's done. Vultr, AMD, Broadcom and Juniper Networks to pioneer new GPU data center architecture | Digital Infra Network

Vultr, AMD, Broadcom and Juniper Networks to pioneer new GPU data center architecture | Digital Infra Network

https://digitalinfranetwork.com

4 Comments
Like Comment
To view or add a comment, sign in
Josh Richards

IT Solutions Expert | E-Commerce Manager | AI Advocate
5mo
Report this post
🚀 Graid SupremeRAID: Redefining Performance and Resilience for AI and HPC Workloads 💾 Graid Technology’s SupremeRAID is transforming the storage landscape by setting new benchmarks in data protection and performance, tailored specifically for the demanding needs of AI and High-Performance Computing (HPC) environments. Traditional RAID systems often struggle with data-intensive tasks, but by using GPU acceleration, Graid SupremeRAID delivers exceptional speed with minimal impact on CPU resources, offering a streamlined solution that ensures both performance and data resilience. 🔧 Performance Comparison: SupremeRAID vs. JBOD and Software RAID In tests conducted using the Gigabyte S183-SH0-AAV1 server, Graid SupremeRAID consistently outperformed JBOD and software RAID in key metrics. While JBOD offered higher raw speed, it lacked the data protection needed for resilience. Software RAID, on the other hand, provided data redundancy but at the cost of significantly slower write speeds. Graid SupremeRAID bridges the gap, providing optimal performance and robust data protection, making it ideal for workloads in AI and HPC environments. 💡 Maximising Efficiency for AI and HPC Workloads Graid SupremeRAID leverages GPU acceleration to optimise system performance, ensuring that crucial resources like CPU, DRAM, and storage can fully support the most demanding applications. This advanced configuration offers a non-blocking architecture, allowing organisations to achieve superior storage performance, data protection, and overall system efficiency. With SupremeRAID, businesses can expect high throughput and reliability, setting a new standard for RAID solutions in data-heavy environments. https://lnkd.in/eJFNJhSq #GraidTechnology #SupremeRAID #AIWorkloads #HPC #DataProtection #StorageSolutions #PerformanceOptimisation #TechInnovation #RAID #GPURaid #DataResilience #ITInfrastructure #SystemPerformance

Performance and Resiliency: Graid SupremeRAID for AI and HPC Workloads

https://www.storagereview.com
Like Comment
To view or add a comment, sign in
PS Lee

Professor and Head (Mechanical Engineering), Executive Director (ESI), Founder (CoolestDC), PhD, ASME Fellow
9mo
Report this post
Ampere Unveils 256-Core Processor in Data Center Power Play The chipmaker has also partnered with Qualcomm to grab a piece of the emerging AI inferencing market. Summary: Ampere Computing has announced its new 256-core AmpereOne processor, expected to be released next year, which promises a 40% performance improvement over its predecessor while maintaining the same power consumption. This new chip, designed for high performance and energy efficiency, is ideal for cloud-native, general-purpose applications, and AI inferencing. Ampere is also developing a joint solution with Qualcomm, combining its CPUs with Qualcomm’s AI inferencing chip to enhance performance for AI applications. Ampere's strategic partnerships aim to capture a significant share of the AI inferencing market and expand beyond its current hyperscaler clientele. This includes collaborating with Supermicro for server solutions and targeting enterprises that prefer on-premises AI inferencing to protect their data. Despite competition from in-house Arm chips developed by major cloud providers like Google and Microsoft, Ampere continues to innovate with its high-core-count CPUs. The company's market strategy includes diversifying its revenue streams through partnerships and joint solutions, such as combining CPUs with video processing chips from NETINT Technologies for complex video applications. The 256-core AmpereOne chip is part of Ampere's broader effort to offer energy-efficient, high-performance processors in a competitive market dominated by x86 architecture from Intel and AMD. Ampere's move to integrate more memory bandwidth and collaborate with Qualcomm highlights its commitment to addressing the rising power demands and sustainability requirements of modern data centers. #AmpereComputing #256CoreProcessor #DataCenters #AIInferencing #Sustainability #EnergyEfficiency #CloudComputing #ArmCPU #Qualcomm #TechInnovation

Ampere Unveils 256-Core Processor in Data Center Power Play

datacenterknowledge.com

3 Comments
Like Comment
To view or add a comment, sign in
Claud Yu

Topstar offers mature IT hardware solutions for networking, computing, and storage from Coherent (Finisar), InnoLight, Eoptolink, Cisco, Dell-EMC, NVIDIA, Xilinx, Brocade, Intel, Juniper, Arista .,etc
9mo
Report this post
The Impact of 400G Networking Architecture on AI Computation Efficiency In the race to build more powerful AI systems, leading technology companies like OpenAI, Google, and Meta are constantly pushing the boundaries of AI research and development. A critical component in this journey is the underlying network infrastructure that supports AI training and inference. As AI models grow in complexity, the demand for high-speed, low-latency data transfer has skyrocketed. This is where 400G network interconnect architecture enters the picture. 😀 The Challenge of Network Bottlenecks When it comes to AI computation, GPU clusters have become the backbone of training and inference. However, even with the most advanced GPUs, network bottlenecks can significantly hinder the efficiency of AI computations. Traditional network architectures, with their limited bandwidth and higher latency, can quickly become overwhelmed by the massive amounts of data that AI models require. This leads to delays in data transfer, which in turn affects the overall speed and accuracy of AI computations. 😀 The Rise of 400G Network Interconnect Architecture 400G network interconnect architecture represents a significant leap forward in addressing these network bottlenecks. By providing ultra-high bandwidth and low latency, 400G networks enable efficient data transfer between GPUs and other components within the AI system. This ensures that AI models can access the data they need quickly and reliably, significantly reducing delays and improving the overall efficiency of AI computations. 😀 The rise of 400G network interconnect architecture is a critical enabler for the next generation of AI systems. By addressing network bottlenecks and enabling efficient data transfer, 400G networks can significantly improve the efficiency of AI computations, ultimately leading to more powerful and reliable AI systems. As leading technology companies like OpenAI, Google, and Meta continue to push the boundaries of AI research, it is clear that 400G networks will play an increasingly important role in the future of AI. Topstar is a mature solution provider of 400G Optical Transceivers. We can offer ODM solution from Coherent (Finisar), Innolight, Eopotolink, and Original solutions from Cisco, Juniper, Arista, NVIDIA/Mellanox to meet various customer AI infrastructure network needs. We are integrating top lead supplier resources to provide 400G SmartNIC, Infiniband switches, Bluefield DPU, Alveo FGPA Accelerator, U.2/E.1 SSD, DDR5 RAM .,etc key AI server components to support global AI reasearch and development. Want to learn more about how we can support your AI data network? Please contact [email protected] or skype: live:fsale_2
Like Comment
To view or add a comment, sign in
Maciej Kranz

Building and scaling businesses and teams
6mo
Report this post
I’m thrilled to share that Pure Storage recently joined the Ultra Ethernet Consortium (UEC), an initiative aimed at building an open, Ethernet-based architecture to power AI and HPC applications. As AI workloads grow, so do the challenges of managing and scaling network solutions. UEC is tackling these issues head-on by advancing Ethernet technology to make it more scalable, efficient, and futureproof. By contributing to UEC, Pure is committed to helping shape the standards that will drive faster innovation, more cost-effective operations, and greater flexibility for enterprises - and that includes ensuring our data storage platform supports Ultra Ethernet standards long-term. Whether it’s speeding up AI model development or making sure our customers have the best networking options, we’re committed to delivering high-performance storage to power the most advanced large-scale AI initiatives. Exciting times ahead as we work to redefine what’s possible with Ethernet! https://lnkd.in/duN9Y2ub #AI #HPC #Innovation #UltraEthernet

Pure Storage Joins Ultra Ethernet Consortium

https://blog.purestorage.com

3 Comments
Like Comment
To view or add a comment, sign in
Danny Moore

Director, AI Ecosystem Development AMD
2mo
Report this post
The AMD AI Ecosystem continues to grow! AMD + Vultr+ Juniper Networks + Broadcom all coming together on this one. With 1000's of MI300X GPU's deployed at Vultr, and all the necessary networking and software stacks on top of it, its easy to gain access to really any size of cluster you'd like to use for any of your AI or HPC workloads. Check out this blog. Vultr also does a really great job at providing documentation for Pytorch, JAX, Tensorflow and more all with ROCm Acceleration. https://lnkd.in/g4ngHsU2

Pioneering the Future of AI with AMD Instinct™ MI300X GPUs, Broadcom, and Juniper Networks | Vultr Blogs

blogs.vultr.com
Like Comment
To view or add a comment, sign in
Ajan Daniel Kutty

Principal System Software Engineer at NVIDIA, FBCS, FIETE, FIEI, SMIEEE
10mo
Report this post
The Quantum-X800 and Spectrum-X800 platforms redefine AI infrastructure with unparalleled networking and computing power.

NVIDIA Announces New Switches Optimized for Trillion-Parameter GPU Computing and AI Infrastructure

nvidianews.nvidia.com

1 Comment
Like Comment
To view or add a comment, sign in

14,548 followers

View Profile Connect

Sharada Yeluri’s Post

More from this author

Should UEC and UAL Merge?

The Evolution of Network Security

In Network Acceleration for AI/ML Workloads

Explore topics