#Gate 2025 Semi-Year Community Gala# voting is in progress! 🔥
Gate Square TOP 40 Creator Leaderboard is out
🙌 Vote to support your favorite creators: www.gate.com/activities/community-vote
Earn Votes by completing daily [Square] tasks. 30 delivered Votes = 1 lucky draw chance!
🎁 Win prizes like iPhone 16 Pro Max, Golden Bull Sculpture, Futures Voucher, and hot tokens.
The more you support, the higher your chances!
Vote to support creators now and win big!
https://www.gate.com/announcements/article/45974
Breaking NVIDIA's "monopoly" with differentiation, d-Matrix reduces the cost of AI inference computing power by 30 times
Original source: Alpha Commune
Behind the explosion of AIGC is the massive demand for AI training and AI reasoning. NVIDIA is currently the largest provider of AI computing power, and its profitability in the second quarter (up 854% year-on-year) sends a signal that the industry's demand for AI computing power is far from being met.
NVIDIA's monopoly on AI computing power (market share of more than 80%) has made many companies that use AI computing power worry, Microsoft, Amazon and OpenAI are actively building cores, and OpenAI has also had acquisition scandals with AI chip startups such as Cerebras and Atomic Semi.
The AI reasoning computing power requirements for running AI applications will greatly exceed the computing power requirements for training large models in the future, and the requirements for reasoning computing power are not the same as training, and the existing GPUs to do inference have no advantage in cost, which requires proprietary AI inference chips.
Recently, d-Matrix, a startup focused on AI reasoning chips, received $110 million in Series B financing, led by Temasek, including investors from previous rounds of financing such as Playground Global, M12 (Microsoft Venture Capital Fund), Industry Ventures, Ericsson Ventures, Samsung Ventures, SK Hynix, etc., with industrial investment accounting for a considerable part. Sid Sheth, CEO of d-Matrix, said: "They are capital that knows how to build a semiconductor business and can work with us for a long time. ”
The new funding from d-Matrix will be used to build Corsair, its Digital In-Memory Computing (DIMC) Chiplet Inference Computing Card. This card is said to be 9 times faster than the NVIDIA H100 GPU, and in the case of a cluster of compute cards, it is 20 times more power efficient, 20 times less latency, and up to 30 times less expensive than NVIDIA's similar solutions.
Two chip veterans aim at AI reasoning computing power needs in the AIGC era
AI systems use different types of computation when training AI models versus using it for predictions and inference. AI inference requires less computing power, but when running a large AI service, it requires more computing power than training in the long run.
It is difficult to deploy a dedicated data center for AI inference at low cost using existing AI hardware. It is reported that Microsoft's GitHub Copilot service is posted an average of $20 per user per month, and according to Dylan Patel, principal analyst at SemiAnalysis, the daily investment cost of OpenAI running ChatGPT may be as high as $700,000. These costs are AI inference costs that cannot be reduced when running AI services.
The AI industry should develop healthier, with lower inference costs and lower energy consumption costs of AI inference chips.
Two chip industry veterans, Sid Sheth and Sudeep Bhoja, founded d-Matrix in 2019 after previously working together at Marvell and Broadcom. In 2019, the AI model of the Transformer architecture was just emerging, and they saw the great potential and opportunity of this model architecture and decided to design their AI hardware specifically for these large language models.
Sid Sheth went on to describe the uniqueness of d-Matrix's market positioning: "Generative AI will forever change the paradigm of how people and companies create, work, and interact with technology.
But the current total cost of ownership (TCO) for running AI inference is rising rapidly, and the d-Matrix team is changing the cost economics of deploying AI inference with purpose-built computing solutions for large language models, and this funding round further confirms our position in the industry. ”
Michael Stewart, an investor in Microsoft M12, said: "We officially entered production when the TCO of big language model inference becomes a key limiting factor for enterprises to use advanced AI in their services and applications. d-Matrix has been following a plan that will provide industry-leading TCO for a variety of potential model serving scenarios using a flexible, resilient Chiplet architecture based on a memory-centric approach. ”
Reduce the cost of AI inference by 30x
Using CPUs and GPUs for AI training and inference is not the most efficient way. For AI inference operations, data movement is the biggest bottleneck. Specifically, transferring data back and forth to random access memory causes significant latency, which in turn leads to higher energy consumption and costs, and slows down the entire AI system.
There are three ways to solve this problem.
The first accelerates deep learning by reducing the amount of data processed through sampling and pipelines, but it also limits accuracy and precision.
The second is to set up a dedicated AI engine processor near the traditional processor, Apple, NVIDIA, Intel and AMD all use this method, but these solutions still use the traditional von Neumann processor architecture, to integrate SRAM and external DRAM memory, they all need to move data in and out of memory, still resulting in high power consumption and low efficiency.
The third is to move the computation closer to RAM (memory), which is the approach taken by d-Matrix. This engine architecture, called Digital In-Memory Computing (DIMC), reduces latency and energy consumption. It's also well suited for AI inference, as inference involves a relatively static (but large) weighted dataset that is accessed repeatedly, and DIMC eliminates most of the energy transfer expense and data movement delays.
d-Matrix uses multiple chiplets to build larger, modular, and scalable integrated circuits. This enables it to build scalable platforms for enterprise-grade AI inference tasks, helping AI enterprises improve performance and efficiency.
Jayhawk II Chiplet
In 2021, d-Matrix launched the Nighthawk Chiplet, after which they launched the Jayhawk Chiplet Platform, the industry's first Open Domain-Specific Architecture (ODSA) Bunch of Vores (BoW) Chiplet platform designed to provide energy-efficient organic substrate-based chip-to-chip connectivity.
Each Jayhawk II Chiplet contains a RISC-V core to manage it, 32 Apollo cores (each with eight DIMC units operating in parallel), and 256 MB SRAM with 150TB/s bandwidth. The core is connected using a special network chip with 84TB/s bandwidth.
Corsair Compute Card
d-Matrix also introduced Corsair compute cards, similar to NVIDIA's H100, each Corsair computing card has 8 Jayhawk II chiplets, each Jayhawk II provides 2Tb/s (250GB/s) chip-to-chip bandwidth, and a single Corsair computing card has 8Tb/s (1TB/s) aggregate chip-to-chip bandwidth.
d-Matrix claims that servers with Corsair compute cards reduce the total cost of ownership of generative AI inference by 10 to 30 times compared to GPU-based solutions, but this set of hardware will not be officially available until 2024.
d-Matrix Aviator software stack
NVIDIA's power in AI computing power lies not only in the GPU, but also in its CUDA software stack and numerous libraries optimized for specific workloads and use cases, thus forming a complete ecosystem.
Aim for a relatively small model
Sid Sheth, CEO of d-Matrix, pointed out that in addition to positioning AI inference, they are further focused on multi-billion to tens of billions of small and medium-sized models, rather than the general-purpose hundreds of billions of large models.
Karl Freund, founder and principal analyst at Cambrian AI, a semiconductor and AI research firm, agrees, saying: "Most companies don't deploy models with hundreds of billions or trillions of parameters. But they will use the company's own data to fine-tune the model, and the model they will actually deploy will be much smaller. For a model of this size, the NVIDIA H100 is not necessarily the most economical option when it comes to AI inference, and the H100 currently sells for up to $40,000. ”
He also pointed out that d-Matrix faces a window of opportunity, and he has a relatively blank period of time to show its value before giants such as Nvidia turn to this market.
For now, d-Matrix expects revenue of no more than $10 million this year, mostly from customers who buy chips for evaluation. Founder Sheth said d-Matrix expects annual revenue of more than $70 million to $75 million over two years and break even. The market space facing d-Matrix is huge, and Cambrian AI predicts that by 2030, it is possible for the computing power consumption ratio of AI inference chips to reach more than 1000 TOPS per watt.
Autonomy and cost are the soil for AI chips
On the one hand, the survival soil of AI chip startups such as d-Matrix comes from the independent and controllable needs of AI manufacturers, whether it is giants such as Microsoft, Meta, Amazon, super unicorns such as OpenAI, Anthropic, or leading startups such as Cohere, they do not want their AI computing power to be bound to a single company.
On the other hand, the operating cost of AI services, for large model companies, in the long run, the cost of computing power to run AI services will be higher than the cost of computing power for training models, and at this stage, the operating cost of a single user of AI enterprises is a loss-making state, and the total cost of ownership (TCO) is also high. For cash-rich giants, this loss is affordable, but for startups, it is a huge burden, slowing down the further expansion of their business.
Third-party, low-cost AI reasoning computing power is extremely needed for both giants and startups.
At this stage, what are the risks faced by startups in the field of AI chips? One is, of course, the "monopoly" of the NVIDIA giant, as well as Microsoft, Meta, Google, OpenAI, the largest AI companies self-developed chips, and then the software ecological problem supporting the chip.
And these problems, d-Matrix is in the process of solving. It targets the market for commercial small and medium-sized AI models, and also cooperates with the open source community to build a software ecosystem, which can give it a differentiated competitive advantage in the competition of giants.