Breaking NVIDIA's "monopoly" with differentiation, d-Matrix reduces the cost of AI inference computing power by 30 times

2023-10-12 03:21:59

Original source: Alpha Commune

Image source: Generated by Unbounded AI

Behind the explosion of AIGC is the massive demand for AI training and AI reasoning. NVIDIA is currently the largest provider of AI computing power, and its profitability in the second quarter (up 854% year-on-year) sends a signal that the industry's demand for AI computing power is far from being met.

NVIDIA's monopoly on AI computing power (market share of more than 80%) has made many companies that use AI computing power worry, Microsoft, Amazon and OpenAI are actively building cores, and OpenAI has also had acquisition scandals with AI chip startups such as Cerebras and Atomic Semi.

The AI reasoning computing power requirements for running AI applications will greatly exceed the computing power requirements for training large models in the future, and the requirements for reasoning computing power are not the same as training, and the existing GPUs to do inference have no advantage in cost, which requires proprietary AI inference chips.

Recently, d-Matrix, a startup focused on AI reasoning chips, received $110 million in Series B financing, led by Temasek, including investors from previous rounds of financing such as Playground Global, M12 (Microsoft Venture Capital Fund), Industry Ventures, Ericsson Ventures, Samsung Ventures, SK Hynix, etc., with industrial investment accounting for a considerable part. Sid Sheth, CEO of d-Matrix, said: "They are capital that knows how to build a semiconductor business and can work with us for a long time. ”

The new funding from d-Matrix will be used to build Corsair, its Digital In-Memory Computing (DIMC) Chiplet Inference Computing Card. This card is said to be 9 times faster than the NVIDIA H100 GPU, and in the case of a cluster of compute cards, it is 20 times more power efficient, 20 times less latency, and up to 30 times less expensive than NVIDIA's similar solutions.

Two chip veterans aim at AI reasoning computing power needs in the AIGC era

AI systems use different types of computation when training AI models versus using it for predictions and inference. AI inference requires less computing power, but when running a large AI service, it requires more computing power than training in the long run.

It is difficult to deploy a dedicated data center for AI inference at low cost using existing AI hardware. It is reported that Microsoft's GitHub Copilot service is posted an average of $20 per user per month, and according to Dylan Patel, principal analyst at SemiAnalysis, the daily investment cost of OpenAI running ChatGPT may be as high as $700,000. These costs are AI inference costs that cannot be reduced when running AI services.

The AI industry should develop healthier, with lower inference costs and lower energy consumption costs of AI inference chips.

Two chip industry veterans, Sid Sheth and Sudeep Bhoja, founded d-Matrix in 2019 after previously working together at Marvell and Broadcom. In 2019, the AI model of the Transformer architecture was just emerging, and they saw the great potential and opportunity of this model architecture and decided to design their AI hardware specifically for these large language models.

Sid Sheth, CEO and co-founder of d-Matrix, said: "We made a bet in 2019 to focus on an acceleration platform for Transformer models and focus on inference, and by the end of 2022, when generative AI exploded, d-Matrix became one of the few companies to have a generative AI inference computing platform. We grew and seized this opportunity over the course of three years. All of our hardware and software are built to accelerate Transformer models and generative AI. ”

Sid Sheth went on to describe the uniqueness of d-Matrix's market positioning: "Generative AI will forever change the paradigm of how people and companies create, work, and interact with technology.

But the current total cost of ownership (TCO) for running AI inference is rising rapidly, and the d-Matrix team is changing the cost economics of deploying AI inference with purpose-built computing solutions for large language models, and this funding round further confirms our position in the industry. ”

Michael Stewart, an investor in Microsoft M12, said: "We officially entered production when the TCO of big language model inference becomes a key limiting factor for enterprises to use advanced AI in their services and applications. d-Matrix has been following a plan that will provide industry-leading TCO for a variety of potential model serving scenarios using a flexible, resilient Chiplet architecture based on a memory-centric approach. ”

Reduce the cost of AI inference by 30x

Using CPUs and GPUs for AI training and inference is not the most efficient way. For AI inference operations, data movement is the biggest bottleneck. Specifically, transferring data back and forth to random access memory causes significant latency, which in turn leads to higher energy consumption and costs, and slows down the entire AI system.

There are three ways to solve this problem.

The first accelerates deep learning by reducing the amount of data processed through sampling and pipelines, but it also limits accuracy and precision.

The second is to set up a dedicated AI engine processor near the traditional processor, Apple, NVIDIA, Intel and AMD all use this method, but these solutions still use the traditional von Neumann processor architecture, to integrate SRAM and external DRAM memory, they all need to move data in and out of memory, still resulting in high power consumption and low efficiency.

The third is to move the computation closer to RAM (memory), which is the approach taken by d-Matrix. This engine architecture, called Digital In-Memory Computing (DIMC), reduces latency and energy consumption. It's also well suited for AI inference, as inference involves a relatively static (but large) weighted dataset that is accessed repeatedly, and DIMC eliminates most of the energy transfer expense and data movement delays.

d-Matrix uses multiple chiplets to build larger, modular, and scalable integrated circuits. This enables it to build scalable platforms for enterprise-grade AI inference tasks, helping AI enterprises improve performance and efficiency.

Jayhawk II Chiplet

In 2021, d-Matrix launched the Nighthawk Chiplet, after which they launched the Jayhawk Chiplet Platform, the industry's first Open Domain-Specific Architecture (ODSA) Bunch of Vores (BoW) Chiplet platform designed to provide energy-efficient organic substrate-based chip-to-chip connectivity.

The first products to feature d-Matrix's DIMC architecture will be based on the recently announced Jayhawk II processor, a Chiplet containing approximately 16.5 billion transistors.

Each Jayhawk II Chiplet contains a RISC-V core to manage it, 32 Apollo cores (each with eight DIMC units operating in parallel), and 256 MB SRAM with 150TB/s bandwidth. The core is connected using a special network chip with 84TB/s bandwidth.

Corsair Compute Card

d-Matrix also introduced Corsair compute cards, similar to NVIDIA's H100, each Corsair computing card has 8 Jayhawk II chiplets, each Jayhawk II provides 2Tb/s (250GB/s) chip-to-chip bandwidth, and a single Corsair computing card has 8Tb/s (1TB/s) aggregate chip-to-chip bandwidth.

The architecture and software scalability of d-Matrix enable it to aggregate integrated SRAM memory into a unified memory pool that provides very high bandwidth. For example, a server with 16 Corsair cards has 32 GB of SRAM and 2TB of LPDDR5, which is enough to run a Transformer model with 20 billion to 30 billion parameters.

d-Matrix claims that servers with Corsair compute cards reduce the total cost of ownership of generative AI inference by 10 to 30 times compared to GPU-based solutions, but this set of hardware will not be officially available until 2024.

d-Matrix Aviator software stack

NVIDIA's power in AI computing power lies not only in the GPU, but also in its CUDA software stack and numerous libraries optimized for specific workloads and use cases, thus forming a complete ecosystem.

d-Matrix also provides customers with a complete experience with the Aviator software stack alongside hardware, which includes a range of software for deploying models in production, such as ML toolchains, system software for workload distribution, inference server software for production deployments, etc. And much of its software stack leverages widely adopted open source software.

Aim for a relatively small model

Sid Sheth, CEO of d-Matrix, pointed out that in addition to positioning AI inference, they are further focused on multi-billion to tens of billions of small and medium-sized models, rather than the general-purpose hundreds of billions of large models.

Karl Freund, founder and principal analyst at Cambrian AI, a semiconductor and AI research firm, agrees, saying: "Most companies don't deploy models with hundreds of billions or trillions of parameters. But they will use the company's own data to fine-tune the model, and the model they will actually deploy will be much smaller. For a model of this size, the NVIDIA H100 is not necessarily the most economical option when it comes to AI inference, and the H100 currently sells for up to $40,000. ”

He also pointed out that d-Matrix faces a window of opportunity, and he has a relatively blank period of time to show its value before giants such as Nvidia turn to this market.

For now, d-Matrix expects revenue of no more than $10 million this year, mostly from customers who buy chips for evaluation. Founder Sheth said d-Matrix expects annual revenue of more than $70 million to $75 million over two years and break even. The market space facing d-Matrix is huge, and Cambrian AI predicts that by 2030, it is possible for the computing power consumption ratio of AI inference chips to reach more than 1000 TOPS per watt.

Autonomy and cost are the soil for AI chips

On the one hand, the survival soil of AI chip startups such as d-Matrix comes from the independent and controllable needs of AI manufacturers, whether it is giants such as Microsoft, Meta, Amazon, super unicorns such as OpenAI, Anthropic, or leading startups such as Cohere, they do not want their AI computing power to be bound to a single company.

On the other hand, the operating cost of AI services, for large model companies, in the long run, the cost of computing power to run AI services will be higher than the cost of computing power for training models, and at this stage, the operating cost of a single user of AI enterprises is a loss-making state, and the total cost of ownership (TCO) is also high. For cash-rich giants, this loss is affordable, but for startups, it is a huge burden, slowing down the further expansion of their business.

Third-party, low-cost AI reasoning computing power is extremely needed for both giants and startups.

At this stage, what are the risks faced by startups in the field of AI chips? One is, of course, the "monopoly" of the NVIDIA giant, as well as Microsoft, Meta, Google, OpenAI, the largest AI companies self-developed chips, and then the software ecological problem supporting the chip.

And these problems, d-Matrix is in the process of solving. It targets the market for commercial small and medium-sized AI models, and also cooperates with the open source community to build a software ecosystem, which can give it a differentiated competitive advantage in the competition of giants.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Share

Comment

0/400

No comments

Topic
1/3
1BTC & ETH Launchpool Yield Exceeds 3%
17k Popularity
2White House Crypto Report
4k Popularity
3Fed Holds Rates Decision
5k Popularity
4Alpha Points System Opens
15k Popularity
5Ethereum 10th Anniversary
21k Popularity

sitemap