Scaling AI the Right Way: Introducing Our Rack-Level Inference Solution

In today’s rapidly evolving AI landscape, it’s clear that inference — not just training — is becoming the new scaling challenge. As models grow in size and capability, the infrastructure to serve them must scale with equal agility, speed, and efficiency.

We need to re-think the way we’re going to keep up with that burgeoning demand for AI inference. And it’s becoming clearer that the answer isn’t just more powerful cards. It’s a more powerful solution, one that rebuilds the entire concept of a data center rack from the ground up to power the extreme demands of AI inference.

At d-Matrix, we’ve spent the last several years building technology and products specifically for this moment. Today, we’re excited to share a major milestone: the launch of SquadRack™, our reference blueprint for the industry’s first rack-scale solution purpose-built for AI inference at datacenter scale. SquadRack is built around our Corsair™ accelerators and JetStream™ Transparent NIC and qualified with industry-leading AI infrastructure from Arista, Broadcom and Supermicro.

Why we’re designing a rack-scale AI inference solution

From day one, it was clear to us that AI inference wasn’t going to just be a GPU in a rack. Critical bottlenecks were apparent immediately in compute, memory, and I/O. The problem was an AI solution problem by definition from the get-go.

It was also obvious classic architecture wouldn’t scale with the crushing demands for AI inference. You’ve likely seen that demand first-hand. You might only get to use an advanced feature a few times a day; you’re rate-limited constantly across new products; and you frequently can’t even access them as major model providers struggle with uptime.

Demand for these features and products isn’t staying stable, either—it’s only growing as we discover new ways to use them in our daily lives. We launched our AI accelerator Corsair to tackle the memory barrier, our T-NIC JetStream to tackle the I/O barrier. The next obvious step was to assemble it into a one-stop AI inference rack-scale solution blueprint: SquadRack!

Working with the ecosystem to build a truly disaggregated solution purpose-built for AI inference

SquadRack is the industry’s first rack-scale solution which is purpose-built for AI inference, using a disaggregated standards-based approach.

We built Corsair and JetStream in industry-standard PCIe form factors, which can be easily integrated into standard PCIe Gen 5 AI servers. JetStream enables fast accelerator-to-accelerator communication using an industry-standard ethernet protocol and integrates seamlessly with standard ethernet Top-of-Rack switches. This makes it easy for datacenter operators to right-size their rack-level deployments based on their current inference demand and have the flexibility to easily increase the deployed footprint as the demand grows over time.

SquadRack is a reference rack blueprint, which enables customization of number of nodes in a rack based on the datacenter environment, based on rack power or rack height constraints. And it is air-cooled and doesn’t have any special cooling infrastructure requirements. We built SquadRack to make innovative high-performance AI inference accessible and easy to deploy in existing datacenters around the world.

Supporting open standards and customer optionality doesn’t just benefit customers—it benefits everyone working to build AI inference solutions in the first place. Building with the ecosystem creates massive opportunities to learn and grow with the runaway demand for AI inference.

We saw the need for SquadRack but it didn’t make sense to redesign the wheel. Many high-performance AI servers and ethernet switches are already on the market and deployed in data centers around the world. Our adoption of PCIe and ethernet open standards made it possible for us to partner with industry-leading AI infrastructure providers such as Arista, Broadcom and Supermicro.

Building with the ecosystem—and not replacing it entirely—is the pathway to meeting that demand for any developer working on AI-driven applications. With the rapid growth of agentic AI, reasoning and video generation use cases across various enterprises and industry verticals, our customers now have the option to rapidly deploy ultra-low latency batched inference at scale.

If you’re interested in working with us too as we build with the ecosystem, we’d love to hear from you.

Article Tags:

Scaling AI the Right Way: Introducing Our Rack-Level Inference Solution

Why we’re designing a rack-scale AI inference solution

Working with the ecosystem to build a truly disaggregated solution purpose-built for AI inference

Article Tags:

Suggested Articles

The Complete Recipe to Unlock AI Reasoning at Enterprise Scale

Blazing the Trail Toward More Scalable, Affordable AI with 3DIMC

AI is a context problem

Scaling AI the Right Way: Introducing Our Rack-Level Inference Solution

Why we’re designing a rack-scale AI inference solution

Working with the ecosystem to build a truly disaggregated solution purpose-built for AI inference

Article Tags:

Suggested Articles

The Complete Recipe to Unlock AI Reasoning at Enterprise Scale

Blazing the Trail Toward More Scalable, Affordable AI with 3DIMC

AI is a context problem

Get the latest d-Matrix updates to your inbox. Sign up below: