Computing carnival, who is the "Chinese version" of Nvidia?

**Source: **Core Tide IC

Text: Wang Yike Ten Alleys

Editor: Su Yang Xubai

* "Core things are heavy" Tencent Technology's semiconductor industry research plan, this issue of Core Tide IC and Tencent Technology, focusing on behind the explosion of large models, the formation of a new pattern in the global chip computing power market, the layout of leading companies and the growth of domestic manufacturers chase. *

A wave of AI revolution accidentally triggered by ChatGPT once again ignited the AI chip market.

"Chips such as A800 and H800 have changed from about 120,000 RMB to 250,000 or even 300,000, or even as high as 500,000." This is a real scene in the domestic chip distribution circle. Skyrocketing, major domestic manufacturers want to get chips in large quantities, and they have to have a "direct relationship" with Huang Renxun.

As the so-called "no chip, no AI", as the demand for computing power of large models soars, chips, which are the foundation of AI technology, usher in important business opportunities. OpenAI once predicted that in order to make breakthroughs in artificial intelligence scientific research, the computing resources required to consume will double every 3 to 4 months, and funds will also need to be matched by exponential growth. "Moore's Law".

Nvidia CFO Kress said that the current market demand for AI computing power has exceeded the company's expectations for the next few quarters, and there are too many orders to fulfill.

The wave of generative AI has made Nvidia a lot of money. After 14 years of listing, Nvidia has successfully entered the trillion-dollar market capitalization club. To achieve this goal, Silicon Valley giants such as Apple took 37 years, Microsoft took 33 years, Amazon took 21 years, and Tesla ran the fastest. , only took 11 years.

This has also stimulated Chinese chip companies to be eager to try. Domestic chip companies such as Haiguang Information, Cambrian, Loongson Zhongke, Biren Technology, Tianshu Zhixin, etc., all have the ambition of a "Chinese version" of Nvidia, and try to rely on their own research. Domestic large-scale model empowerment. Some major manufacturers have also begun to use self-developed AI chips to support part of the training or reasoning tasks of the model, such as Baidu Kunlun chips, Ali Hanguang 800...

Faced with the trillion-dollar market brought about by AI computing power, can domestic companies enjoy this wave of dividends? How should domestic chip manufacturers overcome the "mountain" of Nvidia? This is a problem that no company can escape.

01. The AI frenzy has created a trillion-dollar market value Nvidia

Men who love to wear leather clothes are the first to enjoy AI dividends.

At the end of 2022, after ChatGPT came out, it quickly caused an AI frenzy all over the world. Among them, Nvidia, which has been betting on the future of AI, has become one of the companies that benefited the most from the ChatGPT wave. At this year's Nvidia GTC conference, Nvidia founder and CEO Jen-Hsun Huang revealed new artificial intelligence and chip technology, and said that the "iPhone moment" for artificial intelligence has arrived.

At the press conference, Huang Renxun said that the deployment of large-scale language models similar to ChatGPT is an important new reasoning workload. In order to support large-scale language model reasoning, Nvidia has released a series of products and services around the AI field. Among them, a new architecture is adopted. And the H100 chip with a more advanced process is the most eye-catching.

Source: NVIDIA official website

This GPU is an H100 based on the NVIDIA Hopper architecture, equipped with a Transformer engine designed to process and drive pre-trained models similar to ChatGPT. A standard server with four pairs of H100s and dual-GPU NVLink can speed up training by a factor of 10 compared to HGX A100 for GPT-3 processing.

"H100 can reduce the processing cost of large language models by an order of magnitude." Huang Renxun once said. Based on the H100 chip, Nvidia has also built the latest DGX supercomputer, equipped with 8 H100 GPUs, so that they can be connected to form a huge GPU, providing a "blueprint" for the construction of AI infrastructure. At present, the new DGX supercomputer has been fully put into production.

After that, Nvidia's high-performance GPU chips such as A100, H100, A800 and H800 raised their prices accordingly, especially the flagship chip H100, which was sold for more than 40,000 US dollars on overseas e-commerce platforms in mid-April, and some sellers even priced it at 6.5 Ten thousand U.S. dollars.

At the same time, Nvidia's China-specific A800 and H800 chips have also been looted. "It is basically difficult for domestic large-scale model companies to obtain these chips. The demand in the entire market exceeds demand, and the shortage is very serious." Zhang Jue, founder of electronic component procurement supplier "Guangxin Century", said frankly to Xinchao IC: "This year, This type of GPU chip has changed from about RMB 120,000 to RMB 250,000 or even RMB 300,000, or even up to RMB 500,000."

There is no doubt that Nvidia's technological leadership in high-performance GPUs, and its two AI chips, the A100 and H100, are the core driving force for large language models like ChatGPT.

Some cloud computing professionals believe that 10,000 Nvidia A100 chips are the computing power threshold for a good AI model. The AI supercomputer Microsoft built for OpenAI to train its models is equipped with 10,000 Nvidia GPU chips. Coincidentally, major domestic Internet companies have also placed large orders with Nvidia. According to a late LatePost report, Byte has ordered more than US$1 billion of GPUs from Nvidia this year. Another large company that cannot be named has an order of at least more than 1 billion. Yuan Renminbi.

What is even more exaggerated is that whether these companies can finally grab the card depends more on the business relationship, especially whether it was a major customer of Nvidia in the past. "Whether you talk to China's Nvidia or go to the United States to talk to Lao Huang (Huang Renxun) directly, it makes a difference."

As a result, Nvidia's financial data once again climbed to new highs. On May 25, Nvidia released its first-quarter financial report. The revenue of the data center business where AI chips are located hit a record high, maintaining a year-on-year growth rate of more than 10%.

Huang Renxun revealed that the entire data center product line is now in production, and the supply is being greatly increased to meet the surge in demand.

A series of good news directly drove Nvidia's stock price higher and higher. On the evening of May 30, the U.S. stock market opened, and Nvidia’s market value directly exceeded $1 trillion. On July 19, the total market value of Nvidia soared by 175 billion US dollars overnight, triggering an investment boom again.

According to the companiesmarketcap website, Nvidia’s total market value ranks sixth in the world, and it is also the chip company with the highest market value at present, close to two TSMC ($533.6 billion). Nvidia’s share price has risen by about 180% this year. I have to admit that this wave of AI frenzy has made Huang Renxun's Nvidia full.

02. It is impossible for Nvidia to enjoy the computing power frenzy

"Nvidia won't have a monopoly on large-scale training and inference chips forever."

That was Tesla CEO Elon Musk's response to a tweet from Adam D'Angelo, CEO of the social question-answer site and online knowledge marketplace Quora, who wrote: "One reason the AI boom is underappreciated is GPUs. /TPU shortage, which led to various restrictions on product launches and model training, but none of these were apparent. Instead, we saw Nvidia’s stock price soaring. Once supply met demand, things accelerated.”

Obviously, Silicon Valley Iron Man disagrees with this. He also commented: "Many other neural network accelerator chips are also under development, and Nvidia will not monopolize large-scale training and reasoning forever."

A storm is coming.

How big a computing power market can the AI frenzy centered on large models drive? Soochow Securities believes that the demand for computing power of AI models continues to expand, opening up the market demand for high-performance computing chips. It is estimated that the market size of my country's AI chips will reach 178 billion yuan in 2025, and the compound annual growth rate from 2019 to 2025 will reach 42.9%. From the perspective of market size, AI chips are in their infancy, but they have huge growth potential.

The AI chip is a broad concept, which generally refers to a module specially used to process computing tasks in artificial intelligence applications. It is a computing task hardware born in the era of rapid development of artificial intelligence applications. All chips for artificial intelligence applications are called AI chips. There are three main technical routes: general-purpose (GPU), semi-custom (FPGA), and custom (ASIC).

From the perspective of large-scale model training, scenario-based fine-tuning, and inference application scenarios, the heterogeneous computing power provided by CPU+AI chips, superior parallel computing capabilities, and high interconnection bandwidth can support the maximum efficiency of AI computing and become an intelligent The mainstream solution for computing.

In terms of market share, according to iResearch, by 2027, China's AI chip market is expected to reach 216.4 billion yuan. With the optimization of AI models implemented, the proportion of AI reasoning chips will increase day by day. In 2022, China's AI training chips and AI reasoning chips will account for 47.2% and 52.8% respectively.

At present, there are three types of players in the field of AI chips: one is the old chip giants represented by Nvidia and AMD, with outstanding product performance; the other is the cloud computing giants represented by Google, Baidu, and Huawei. Models, and developed AI chips, deep learning platforms, etc. to support the development of large models. For example, Huawei's Kunpeng Ascend, CANN and Mindspore, Baidu's Kunlun Core, etc. Finally, there are some small and beautiful AI chip unicorns, such as Cambrian, Biren Technology, Tianshu Zhixin, etc.

Although the outbreak of domestic large-scale models may cause a gap in computing power, it is only a matter of time before domestic chip manufacturers enjoy the dividends of domestic replacement. As a developer of AI training chips, Cambrian, the "first AI chip stock", has once again attracted market attention, and its stock price has continued to rise. The latest market value has exceeded 90 billion.

In the cloud product line, Cambricon has launched four generations of chip products: Siyuan 100 in 2018, Siyuan 270 in 2019, Siyuan 290 (vehicle) in 2020, and Siyuan 370 series released in 2021. It is used to support artificial intelligence processing tasks with rapid growth in complexity and data throughput in cloud computing and data center scenarios. In addition, Cambrian also has a product under research, Siyuan 590, which has not yet been released. In addition, by the end of 2022, the Siyuan 370 series and AIGC product Baidu Flying Paddle will complete Level II compatibility testing.

However, whether domestic large-scale model companies have adopted Cambrian chips has not yet received accurate information. "In the field of high-end AI chips, domestic manufacturers are in their infancy, and many things need time and money to verify." A senior chip engineer revealed. Even the chips of companies such as Huawei, Baidu, and Haiguang Information have a clear gap with Nvidia products.

Someone once said frankly that the gap between Nvidia and other chip manufacturers is the difference between academicians and high school students. As Huang Renxun said, Nvidia "has been running", and other chip manufacturers who want to surpass the giants can only run wildly.

03. The "Game of Thrones" behind the big AI model

In addition to Nvidia, AMD, another GPU giant, has also taken action recently.

Recently, AMD released the latest accelerator card. At the conference site where AMD launched the latest accelerator card Instinct MI300X, a line of words was specially typed on the PPT - dedicated to large language models. This is regarded by the industry as a direct declaration of war against Nvidia!

It is reported that the high-bandwidth memory (HBM) density of MI300X can reach up to 2.4 times that of NVIDIA H100, and the bandwidth of high-bandwidth memory can reach up to 1.6 times that of H100. Obviously, MI300X can run a larger AI model than H100.

The MI300 series where the MI300X is located is a series of the latest APU accelerator cards created by AMD for AI and HPC. Among them, MI300A is the "basic model", and MI300X is the "large model optimized model" with higher hardware performance.

At present, the MI300A has been sampled, and it is estimated that it will be available for purchase soon; the large-scale dedicated card MI300X, and the AMD Instinct computing platform integrating 8 MI300X are expected to be sampled in the third quarter of this year, and will be launched in the fourth quarter.

In the past few years, compared with Nvidia's big moves in the field of AI, AMD's actions seem a bit slow. As Eric Jang, CEO of DeepBrain AI, said, he feels that AMD has disappointed him in the past few years, and nothing has changed in the past five years. Especially during the outbreak of AIGC, if AMD does not work hard to keep up, the gap will only widen.

With the launch of AMD's MI300 series products, we can finally see AMD and Nvidia fighting head-on.

Unfortunately, the market doesn't seem to be buying AMD's new cards.

During this AMD conference, its stock price did not rise but fell. In contrast, Nvidia's stock price has also risen a wave. Market sentiment is not difficult to understand, because in the high-tech field, especially in emerging markets, it is becoming a common logic in the commercial market to keep pace with each step and keep the strong.

But in fact, after careful study of the reasons, it can be found that the main reason why Nvidia monopolizes the artificial intelligence training chip market is its self-developed CUDA ecology. Therefore, if AMD MI300 wants to replace Nvidia, it first needs to be compatible with Nvidia’s CUDA ecosystem. AMD launched the ROCm ecosystem for this purpose, and achieved full compatibility with CUDA through HIP, thereby reducing the known cost of users.

In this regard, Murong Yi, a well-known investment blogger, believes that the difficulty of taking the route compatible with NVIDIA CUDA is that its update iteration speed can never keep up with CUDA, and it is difficult to achieve full compatibility, that is, on the one hand, the iteration is always one step slower. Nvidia GPU iterates quickly on the micro-architecture and instruction set, and corresponding function updates are required in many places on the upper software stack, but it is impossible for AMD to know Nvidia’s product roadmap, and software updates will always be one step slower than Nvidia (for example, AMD has It may have just announced support for CUDA11, but Nvidia has launched CUDA12); on the other hand, the difficulty of full compatibility will increase the workload of developers. The architecture of large-scale software like CUDA is very complicated, and AMD needs to invest a lot of manpower and material resources. It will take years or even more than ten years to catch up, because there will inevitably be functional differences, and if the compatibility is not done well, it will affect the performance. Therefore, these are also the key reasons why everyone is not buying it.

According to estimates by Khaveen Investments, Nvidia's data center GPU market share will reach 88% in 2022, and AMD and Intel will share the rest.

Since OpenAI released ChatGPT last year, a new round of technological revolution has continued to ferment. It can be said that no technological advancement has attracted the attention of the world as much as ChatGPT for many years.

Various technology companies, scientific research institutions, and colleges and universities at home and abroad are following up. In less than half a year, a lot of start-up companies for large-scale model applications have emerged, and the scale of financing has repeatedly hit new highs.

According to the blogger wgang, Baidu, iFLYTEK, 4Paradigm, Tsinghua University, Fudan, and other major domestic factories, start-up companies, and scientific research institutions have successively released large-scale model products:

Source: Zhihu wgwang

It can be seen that not only in the general field, but also in specific industry scenarios, especially in some fields with strong professionalism and high knowledge density, technology companies are also releasing large models in vertical fields. For example, Baijiayun (RTC), a US-listed company, recently released the AIGC product "Market Easy" based on its insight into the service needs of enterprises. This is also the first GPT large-scale model engine suitable for content production scenarios of enterprise marketing departments.

Some industry insiders said with a smile: "The domestic large-scale models have formed a situation where groups of models dance wildly and a hundred models compete. It is expected that there will be more than 100 large-scale models by the end of the year."

However, the development of large models requires the support of three important factors: algorithms, computing power, and data. Computing power is an important energy engine for large model training, and it is also a major barrier to the development of the large model industry in China.

Chip capability directly affects the effect and speed of high-computing training. As mentioned above, despite the frequent emergence of domestic large-scale model products, judging from the chips behind them, all these platforms use either Nvidia A100 and H100 GPUs, or Nvidia’s reduced configuration version A800 and A800 GPUs specially launched after the ban last year H800, the bandwidth of these two processors is about 3/4 and about half of the original version, avoiding the limitation standard of high-performance GPU.

In March of this year, Tencent took the lead in announcing that it had used the H800. It had already used the H800 in the new version of high-performance computing services released by Tencent Cloud, and said it was the first in China.

Alibaba Cloud also proposed internally in May this year that the "Smart Computing Battle" will be the number one battle this year, and the number of GPUs has become an important indicator of its battle.

In addition, Shangtang also announced that nearly 30,000 GPUs have been deployed in its "AI large device" computing cluster, of which 10,000 are Nvidia A100. Byte and Meituan directly allocate GPUs from other business teams of the company for large model training. Some manufacturers have even been looking for various complete machine products that can remove the A100 in the market since the second half of 2022, with the sole purpose of obtaining GPU chips. "There are too many machines and not enough places to store them."

It is understood that domestic leading technology companies have invested heavily in AI and cloud computing. In the past, the accumulation of A100 has reached tens of thousands.

At the same time, China's major technology companies are still engaged in a new round of procurement competition.

According to a cloud service provider, big companies such as Byte and Alibaba mainly negotiate purchases directly with the original Nvidia factory, and it is difficult for agents and second-hand markets to meet their huge needs.

As mentioned above, ByteDance has ordered more than US$1 billion of GPU products from Nvidia this year. The purchase volume of Byte alone this year is close to the total sales of commercial GPUs sold by Nvidia in China last year. According to reports, there is another large company with an order of at least more than 1 billion yuan.

It can be seen that China's big technology companies are very urgent to purchase GPUs.

Not only domestic companies, but also major foreign customers have a very strong demand for Nvidia's A100/H100 chips. According to statistics, Baidu, which first started testing ChatGPT-like products, has an annual capital expenditure of between US$800 million and US$2 billion since 2020, and that of Alibaba between US$6 billion and US$8 billion. During the same period, Amazon, Meta, Google, and Microsoft, the four American technology companies that built their own data centers, had annual capital expenditures of at least US$15 billion.

At present, the visibility of Nvidia's orders has reached 2024, and high-end chips are in short supply. With the current production schedule, even the A800/H800 will not be delivered until the end of this year or next year. In the short term, from the perspective of its popularity, the only thing that affects Nvidia's high-end GPU sales may be TSMC's production capacity.

04. Behind the "crazy" Nvidia, are domestic chips insufficient in both hardware and software?

Judging from the chip supply of large-scale model products, there are currently no substitutes for A100, H100 and the reduced versions of A800 and H800 specially supplied to China in terms of AI large-scale model training.

So, why in this round of GPT boom, Nvidia took the lead and performed well?

Zhang Gaonan, managing partner of Huaying Capital, said that on the one hand, it is because Nvidia has the earliest layout, and its microkernel structure has also evolved and improved from generation to generation. Now whether it is in terms of concurrency, bus speed, or microkernel's mature support for matrix transformation, its capabilities are already very efficient, including providing a very complete CUDA computing platform at the same time, which has in fact become a potential industry standard for deep learning algorithms. The supporting facilities of the entire industrial chain are also very complete, and the comprehensive competition barriers and the depth of the moat are extremely high.

To sum up, the current irreplaceability of Nvidia GPU comes from the training mechanism of large models. Its core steps are pre-training and fine-tuning. The former is to lay the foundation, which is equivalent to receiving general education To graduate from university; the latter is optimized for specific scenarios and tasks to improve work performance.

So, can domestic GPU chips support the computing power requirements of large models?

In practical applications, the large model's demand for computing power is divided into two stages. One is the process of training the ChatGPT large model; the other is the reasoning process of commercializing the model. That is, AI training is to make models, and AI reasoning is to use models, and training requires higher chip performance.

Based on this, domestic AI chip companies continue to emerge, releasing products to the market one after another. Companies such as Suiyuan Technology, Biren Technology, Tianshu Zhixin, and Cambrian have all launched their own cloud GPU products, and the theoretical performance indicators are not weak. Haiguang Information's DCU chip "Shensu No. 1" has a relatively complete software and hardware ecosystem and is compatible with the CUDA architecture. And big Internet companies such as Tencent, Baidu, and Ali are also vigorously deploying in the field of AI chips through investment and incubation.

Among them, large-scale model training needs to process high-granularity information, which requires higher precision and computing speed for cloud training chips. At present, most domestic GPUs do not have the ability to support large-scale model training. It is suitable for cloud inference work that does not require such high granularity of information.

* AI products and application core wave ICs of some domestic related companies are sorted out according to public information *

In March of this year, Baidu Li Yanhong publicly stated that the Kunlun chip is now very suitable for reasoning of large models and will be suitable for training in the future.

Zou Wei, vice president of Tianshu Zhixin, also told Xinchao IC that there is still a certain gap between domestic chips and Nvidia's latest products, but in terms of inference calculations, domestic chips can achieve the same performance as mainstream products, and with the application of artificial intelligence Popularity, the market demand for reasoning chips will accelerate growth, and with the expansion of demand, domestic chips will also have a larger market.

Another person in the industry who did not want to be named said, "Domestic general-purpose GPU products do have a gap with international flagship products in meeting large-scale model training, but it is not irreparable. It is just that the industry has not designed in the direction of large-scale models in the product definition. "

At present, industry practitioners are making relevant explorations and efforts, such as thinking about whether chip computing power can be improved through chiplets and advanced packaging. At present, domestic GPU companies are doing chip development and layout in the field of large models.

From the perspective of capital, Zhang Gaonan, managing partner of Huaying Capital, told Xinchao IC that Huaying has paid close attention to computing power infrastructure for a long time. Whether it is GPU, DPU or more cutting-edge photoelectric hybrid computing, quantum computing, there are Targeted research and layout. On the whole, it focuses on general-purpose computing infrastructure, such as FPGA and edge computing. In contrast, at present, many computing power chips around deep learning, special algorithms, local computing power optimization, etc. are not the focus of its consideration.

In fact, in addition to the gap in hardware performance, the software ecosystem is also a shortcoming of domestic AI chip manufacturers.

The chip needs to adapt to multiple levels such as hardware system, tool chain, compiler, etc., and needs strong adaptability. Otherwise, this chip can run 90% of the computing power in one scene, but can only run 90% in another scene. Run out of 80% performance scenario.

As mentioned above, Nvidia has obvious advantages in this regard. As early as 2006, Nvidia launched the computing platform CUDA, which is a parallel computing software engine. The CUDA framework integrates a lot of codes required to invoke GPU computing power. Engineers can directly use these codes without writing them one by one. Developers can use CUDA to perform AI training and reasoning more efficiently, and make better use of GPU computing power. Today, CUDA has become an AI infrastructure, and mainstream AI frameworks, libraries, and tools are all developed based on CUDA.

Without this set of coding languages, it will be extremely difficult for software engineers to realize the value of hardware.

If GPUs and AI chips other than Nvidia want to access CUDA, they need to provide their own adaptation software. According to industry insiders, I have contacted a non-NVIDIA GPU manufacturer. Although its chip and service quotations are lower than NVIDIA’s and promise to provide more timely services, the overall training and development costs of using its GPU will be higher than NVIDIA’s. Undertake the uncertainty of results and development time.

Although Nvidia GPUs are expensive, they are actually the cheapest to use. For companies that intend to seize the opportunity of large-scale models, money is often not a problem, and time is a more precious resource. Everyone must obtain enough advanced computing power as soon as possible to ensure the first-mover advantage.

Therefore, for domestic chip suppliers, even if a product with comparable computing power can be stacked by stacking chips, it is more difficult for customers to accept software adaptation and compatibility. In addition, from the perspective of server operation, its motherboard expenses, electricity charges, operating expenses, and issues such as power consumption and heat dissipation that need to be considered will greatly increase the operating costs of the data center.

Because computing power resources often need to be presented in the form of pooling, data centers are usually more willing to use the same chip or chips from the same company to reduce the difficulty of computing power pooling.

The release of computing power requires complex software and hardware cooperation to turn the theoretical computing power of the chip into effective computing power. For customers, it is not easy to use domestic AI chips. Replacement of cloud AI chips requires certain migration costs and risks, unless the new product has performance advantages, or can provide problems that others cannot solve in a certain dimension. Otherwise, the willingness of customers to replace is very low.

As the only GPU supplier that can actually process ChatGPT, Nvidia is the well-deserved "king of AI computing power". Six years ago, Huang Renxun personally delivered the first supercomputer equipped with the A100 chip to OpenAI, helped the latter create ChatGPT, and became the leader of the AI era.

However, since the United States implemented export controls last year, Nvidia has been banned from exporting two of its most advanced GPU chips, the H100 and A100, to China. This is undoubtedly a blow to downstream application companies.

From the perspective of security and self-control, this also provides a new window of opportunity for domestic chip companies. Although domestic chips are inferior to industry giants such as Nvidia and AMD in terms of performance and software ecology, driven by complex international trade relations and geopolitical factors, "domestic substitution" has become the main theme of the development of the domestic semiconductor industry.

05. Conclusion

Every increase in computing power will set off a wave of technological and industrial changes: CPUs lead mankind into the PC era, mobile chips set off a wave of mobile Internet, and AI chips break the decades-long computing power bottleneck in the AI industry.

Today, the "AI iPhone moment" has arrived, and the road to the next era may already be in front of us.

Although the fields of AI chips and software systems in these data centers are still dominated by foreign manufacturers, the market door for "localization of computing power" may be opening now.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)