Chiplet packaging is catching on with companies designing high-performance processors for data center and AI applications. While familiar names such as Intel and AMD are in this space, so are some smaller startup companies. One of them is d-Matrix, a young company developing technology for AI-compute and inference processors.
Today, D-Matrix, a company focused on building accelerators for complex matrix math supporting machine learning, announced a $44 million series A round. Playground Global led the round with support from Microsoft’s M12 and SK Hynix. The three join existing investors Nautilus Venture Partners, Marvell Technology and Entrada Ventures.
Hardware startup d-Matrix says the $44 million it raised in a Series A round today will help it continue development of a novel “chiplet” architecture that uses 6 nanometer chip embedded in SRAM memory modules for accelerating AI workloads.
Abstract: The advent of large transformer based language models (BERT, GPT3, ChatGPT, Lamda, Switch) for Natural Language Processing (NLP) and their growing explosive use across Generative AI business and consumer applications has made it imperative for AI accelerated computing solutions to provide an order of magnitude improvements in efficiency. We will discuss a modular, chiplet based spatial CGRA-like architecture optimized for generative inference with a generalized framework for the successful implementation of deep RL-based mappers in compilers for spatial and temporal architectures. We’ll present results for weight and activation quantization in block floating point formats, building on GPTQ and SmoothQuant, and their support in PyTorch. To reduce KV cache size and bandwidth, we’ll present an extension to EL-attention.
- The cost to develop and maintain the software can be extraordinarily high. - Nvidia makes most of the GPUs for the AI industry, and its primary data center workhorse chip costs $10,000. - Analysts and technologists estimate that the critical process of training a large language model such as GPT-3 could cost over $4 million.
One of the hottest trends in artificial intelligence (AI) this year has been the emergence of popular generative AI models. With technologies including DALL-E and Stable Diffusion, there are a growing number of startups and use cases that are emerging.
Chiplet packaging is catching on with companies designing high-performance processors for data center and AI applications. While familiar names such as Intel and AMD are in this space, so are some smaller startup companies. One of them is d-Matrix, a young company developing technology for AI-compute and inference processors.
The memory wall refers to the physical barriers limiting how fast data can be moved in and out of memory. It’s a fundamental limitation with traditional architectures. In-memory computing or IMC addresses this challenge by running AI matrix calculations directly in the memory module, avoiding the overhead of sending data across the memory bus.
Abstract: The advent of large transformer based language models (BERT, GPT3, ChatGPT, Lamda, Switch) for Natural Language Processing (NLP) and their growing explosive use across Generative AI business and consumer applications has made it imperative for AI accelerated computing solutions to provide an order of magnitude improvements in efficiency. We will discuss a modular, chiplet based spatial CGRA-like architecture optimized for generative inference with a generalized framework for the successful implementation of deep RL-based mappers in compilers for spatial and temporal architectures. We’ll present results for weight and activation quantization in block floating point formats, building on GPTQ and SmoothQuant, and their support in PyTorch. To reduce KV cache size and bandwidth, we’ll present an extension to EL-attention.
- The cost to develop and maintain the software can be extraordinarily high. - Nvidia makes most of the GPUs for the AI industry, and its primary data center workhorse chip costs $10,000. - Analysts and technologists estimate that the critical process of training a large language model such as GPT-3 could cost over $4 million.
The memory wall refers to the physical barriers limiting how fast data can be moved in and out of memory. It’s a fundamental limitation with traditional architectures. In-memory computing or IMC addresses this challenge by running AI matrix calculations directly in the memory module, avoiding the overhead of sending data across the memory bus.