Inference Archives

Why we needed a new Transparent NIC solution

September 18, 2025

Large-scale AI workloads are deployed by stringing together many cards and many nodes. Any increase in model size, context length or users impacts the number of cards needed. Gone are…Read More

Blazing the Trail Toward More Scalable, Affordable AI with 3DIMC

August 25, 2025

AI adoption is accelerating at an unprecedented pace, but the economics of running AI models are becoming unsustainable. And the cracks are beginning to show. As more people rush to…Read More

Why optimizing every layer of AI workloads—from software to infrastructure—is now critical as apps take off

August 14, 2025

We’ve become accustomed to a brand new model coming out every few weeks that one-ups the last one at this point. Developers don’t sit with just a single model—in fact,…Read More

Multimodal agents are here—they just aren’t what you expect

August 12, 2025

The future of AI workflows is almost certainly going to be multi-modal agents. But rather than incredibly complex, compute-hungry multimodal models, there’s already a much easier pathway to get there. …Read More

What is AI Inference and why it matters in the age of Generative AI

June 4, 2025

‘AI Inference’ has been trending everywhere, from keynote speeches to quarterly earnings reports and in the news. You have probably heard phrases like “inference-time compute”, “reasoning” and “AI deployment” and…Read More

Impact of the DeepSeek Moment on Inference Compute

January 31, 2025

In this talk, d-Matrix CTO/Cofounder Sudeep Bhoja discusses the impact that the release of the Deep Seek R1 model is having on inference compute. Stepping through the evolution of reasoning…Read More