AI infrastructure Archives

What does success mean for agentic networks?

March 25, 2026

Just a year ago, the quality of AI models was measured with a mixture of scientific benchmarks, LMArena rankings, and — weirdly, most importantly — vibes. Agentic networks have changed… Read More

Going Vertical: Why we created a 3D DRAM solution to advance low latency AI inference

March 16, 2026

When we launched seven years ago, we had one goal: to build the fastest and most scalable technology to power small-batch AI inference and interactive applications. Both of those have… Read More

Finding the middle ground: how smaller models will unlock the next wave of AI

January 27, 2026

Delivering high-quality AI-powered applications historically relied on massive models. That came with significant scaling limitations, as deploying models with more than 100B parameters and maximizing toke generation doesn’t scale up without losing latency… Read More

The power of the middle lane: why a hybridized approach to memory gives the best of both worlds

January 27, 2026

Conversations around fast inference typically focus on one approach: blazing fast token generation with gargantuan models. Both still play an important part in many circles. Models come in huge flavors, including Qwen (235B)… Read More

Using what’s on hand: spare data center space is an untapped gold mine

December 18, 2025

The first cloud computing revolution triggered a tsunami-grade buildout for data centers—both for emerging hyperscalers like Amazon Web Services, and companies re-orienting their systems around cloud-based operations. At the time, we were… Read More

Batching just right: how interactive apps serve as a new battleground

December 15, 2025

Today, some of the most prevalent AI-powered applications are highly interactive ones: coding companions, customer service chatbots, and others. While they all have different measures of success, they’re increasingly aligned on one… Read More

Why modern AI workloads demand a disaggregated approach

November 18, 2025

In the early days of transformers-based AI workloads, generation speed wasn’t that big of an issue. Users generally tolerated slower inference due to novelty, and serious use cases outside of… Read More

The fight for latency: why agents have changed the game

October 27, 2025

Time-to-first-token, or TTFT, has generally been one of many key performance indicators for AI-based applications. Consumer-facing chatbots, for example, might need it to be very low, while enterprise cases can… Read More

Scaling AI the Right Way: Introducing Our Rack-Level Inference Solution

October 14, 2025

In today’s rapidly evolving AI landscape, it’s clear that inference — not just training — is becoming the new scaling challenge. As models grow in size and capability, the infrastructure… Read More

Open standards are the path to the next AI breakthrough

October 14, 2025

We are inevitably moving toward a world where AI applications need substantially better performance, and classic GPUs alone can’t keep up. And at the same time, customers need flexibility in… Read More

d-Matrix Blog - Tag: AI infrastructure