Large-scale model market, not only the hot HBM

2023-07-12 06:49:26

Original Source: Semiconductor Industry Vertical and Horizontal

Image source: Generated by Unbounded AI‌

Recently, HBM has become a hot topic in the chip industry. According to TrendForce, the bit volume of high-bandwidth memory (HBM) is expected to reach 290 million GB in 2023, a year-on-year increase of approximately 60%, and is expected to further increase by 30% in 2024. The HBM memory concept proposed by AMD in 2008 was realized by SK Hynix through TSV technology in 2013. 10 years after its introduction, HBM seems to have really come to the era of large-scale commercialization.

The take-off of the concept of HBM is directly related to the popularity of AIGC. AI servers have higher requirements for bandwidth. Compared with DDR SDRAM, HBM has higher bandwidth and lower energy consumption. Ultra-high bandwidth makes HBM a core component of high-performance GPU, and HBM is basically the standard configuration of AI servers. At present, the cost of HBM ranks third in the cost of AI servers, accounting for about 9%, and the average selling price of a single server is as high as US$18,000.

Since the emergence of ChatGPT last year, the large-scale model market has begun to grow rapidly. In the domestic market, technology giants such as Baidu, Ali, HKUST Xunfei, SenseTime, and Huawei have successively announced that they will train their own AI large-scale models. TrendForce predicts that in 2025 there will be 5 large-scale AIGCs equivalent to ChatGPT, 25 mid-sized AIGC products of Midjourney, and 80 small-scale AIGC products. Even the minimum computing power resources required globally may require 145,600 to 233,700 NVIDIA A100 GPUs . These are potential growth areas for HBM.

Since the beginning of 2023, HBM orders from Samsung and SK Hynix have increased rapidly, and the price of HBM has also risen. Recently, the price of HBM3 DRAM has increased by 5 times. Samsung has received orders from AMD and Nvidia to increase HBM supply. SK hynix has begun to expand the HBM production line, aiming to double the HBM production capacity. Korean media reported that Samsung plans to invest about 760 million US dollars to expand HBM production, aiming to double HBM production capacity by the end of next year, and the company has placed major equipment orders.

Advantages of HBM in AIGC

Straightforwardly, HBM will increase the computing power of servers. Due to processing a large amount of data in a short period of time, AI servers have higher requirements for bandwidth. The function of HBM is similar to the "transfer station" of data, which is to save the image data such as each frame and image used in the frame buffer area, and wait for the GPU to call. Compared with traditional memory technology, HBM has higher bandwidth, more I/O quantity, lower power consumption, and smaller size, which can greatly improve the data processing volume and transmission rate of AI servers.

Source: rambus

It can be seen that HBM has a "rolling" level advantage in terms of bandwidth. If HBM2E runs at 3.6Gbps on a 1024-bit wide interface, you get 3.7Tb per second of bandwidth, which is more than 18 times the bandwidth of LPDDR5 or DDR4.

In addition to the bandwidth advantage, HBM can save area, which in turn can fit more GPUs in the system. HBM memory consists of a memory stack on the same physical package as the GPU.

Such an architecture means significant power and area savings compared to conventional GDDR5/6 memory designs, allowing more GPUs to be installed in the system. As HPC, AI, and data analytics datasets grow in size and computational problems become more complex, more and more GPU memory capacity and bandwidth are a necessity. The H100 SXM5 GPU provides more than 3 TB/s of memory bandwidth by supporting 80 GB (five stacks) of fast HBM3 memory, which is twice the memory bandwidth of the A100.

Price has been a limiting factor for HBM in the past. But now the large-scale model market is in a period of contention. For the giants who lay out large-scale models, time is money. Therefore, HBM, which is "expensive and expensive", has become the new favorite of large-scale model giants. With the gradual increase in the demand for high-end GPUs, HBM has begun to become the standard configuration of AI servers.

At present, Nvidia's A100 and H100 are each equipped with 80GB of HBM2e and HBM3. In its latest Grace Hopper chip that integrates CPU and GPU, the HBM carrying capacity of a single chip has increased by 20%, reaching 96GB.

AMD's MI300 is also equipped with HBM3. Among them, the MI300A capacity is the same as the previous generation's 128GB, and the higher-end MI300X reaches 192GB, an increase of 50%.

It is expected that Google will actively expand its cooperation with Broadcom in the second half of 2023 to develop the AISC AI acceleration chip TPU is also planned to be equipped with HBM memory to expand the AI infrastructure.

Storage vendor accelerated layout

Such a "money scene" allows storage giants to accelerate the layout of HBM memory. At present, the world's top three memory chip manufacturers are transferring more production capacity to produce HBM, but because it takes time to adjust production capacity, it is difficult to quickly increase HBM production, and it is expected that the supply of HBM will remain tight in the next two years.

HBM's market is mainly controlled by the three major DRAM giants. However, unlike the DRAM market, which is led by Samsung, SK Hynix has developed better in the HBM market. As mentioned at the beginning, SK Hynix developed the first HBM product. In April 2023, SK Hynix announced the development of the first 24GB HBM3 DRAM product, which uses TSV technology to vertically stack 12 single-product DRAM chips that are 40% thinner than existing chips, achieving the same height as 16GB products. Meanwhile, SK Hynix plans to prepare samples of HBM3E with 8Gbps data transmission performance in the second half of 2023, and to put it into mass production in 2024.

The layout of domestic semiconductor companies for HBM mostly revolves around the field of packaging and interfaces.

NationalChip Technology is currently researching and planning the 2.5D chip packaging technology of multi-HBM memory, and actively promoting the research and development and application of Chiplet technology. After the completion of the 2.5D/3D production line of Tongfu Microelectronics Co., Ltd., it will realize a domestic breakthrough in the field of HBM high-performance packaging technology. BIWIN has launched high-performance memory chips and memory modules, and will continue to pay attention to HBM technology. Montage Technology's PCIe 5.0/CXL 2.0 Retimer chip has achieved mass production. This chip is a key upgrade of Montage Technology's PCIe 4.0 Retimer product, which can provide the industry with a stable and reliable high-bandwidth, low-latency PCIe 5.0/CXL 2.0 interconnection solution.

Although HBM is good, it still needs to be calm. HBM is still in a relatively early stage, and its future still has a long way to go. It is foreseeable that as more and more manufacturers continue to make efforts in fields such as AI and machine learning, the complexity of memory product design is rising rapidly, and higher requirements are placed on bandwidth. The rising broadband demand Will continue to drive the development of HBM.

The hotness of HBM reflects the driving ability of AIGC. So besides HBM and GPU, are there other products that can take advantage of this new trend?

Talk about other ignited chips

The advantages of FPGA are beginning to appear

FPGA (Field Programmable Gate Array) is an integrated circuit with programmable logic elements, memory and interconnection resources. Unlike ASIC (Application Specific Integrated Circuit), FPGA has the advantages of flexibility, customizability, parallel processing capability, and easy upgrade.

Through programming, users can change the application scenarios of FPGA at any time, and FPGA can simulate various parallel operations of CPU, GPU and other hardware. Therefore, it is also called "universal chip" in the industry.

FPGAs make sense for the artificial intelligence reasoning needs of frequently changing underlying models. FPGA programmability exceeds the typical economics of FPGA use. To be clear, FPGAs will not be serious competitors to large-scale AI systems using thousands of GPUs, but as AI penetrates further into electronics, the range of applications for FPGAs will expand.

The advantage of FPGA over GPU is lower power consumption and latency. The GPU cannot make good use of the on-chip memory and needs to frequently read the off-chip DRAM, so the power consumption is very high. FPGA can flexibly use on-chip storage, so the power consumption is much lower than that of GPU.

On June 27, AMD announced the launch of the AMD Versal Premium VP1902 adaptive system-on-chip (SoC), which is an adaptive SoC based on FPGA. This is an emulation-grade, chiplet-based device that simplifies the verification of increasingly complex semiconductor designs. It is reported that AMD VP1902 will become the largest FPGA in the world. Compared with the previous generation product (Xilinx VU19P), the new VP1902 adds Versal function and adopts a small chip design, which more than doubles the key performance of FPGA.

Dongxing Securities Research Report believes that FPGA has a great advantage in AI reasoning by virtue of the delay and power consumption advantages brought by its architecture. Zheshang Securities’ previous research report also pointed out that in addition to GPU, the CPU+FPGA solution can also meet the huge computing power demand of AI.

Unlike HBM being monopolized by overseas companies, domestic companies have already accumulated FPGA chips.

The main business of Anlu Technology is the R&D, design and sales of FPGA chips and special EDA software. The products have been widely used in industrial control, network communication, consumer electronics and other fields. Ziguang Tongchuang, a subsidiary of Ziguang Guowei, is a professional FPGA company that designs and sells general-purpose FPGA chips. Ziguang Guowei once stated at the performance briefing that the company's FPGA chip can be used in the AI field. Dongtu Technology mainly carries out the industrialization of FPGA chips. The company's shareholding company Zhongke Yihai Micro team has independently developed EDA software to support the application development of its FPGA products.

New Idea for Domestic Substitution: Integration of Storage and Computing + Chiplet

Can we use our current available processes and technologies to develop AI chips that can compete with Nvidia in terms of performance? Some "new ideas" have emerged, such as the integration of storage and calculation + Chiplet.

The separation of storage and calculation will lead to computing power bottlenecks. With the rapid development of AI technology, the demand for computing power has exploded. In the post-Moore era, the storage bandwidth restricts the effective bandwidth of the computing system, and the growth of system computing power is struggling. For example, it takes 99 days to train the BERT model from scratch with 8 blocks of 1080TI. The storage-computing integrated architecture does not have the concept of deep multi-level storage. All calculations are implemented in the memory, thereby eliminating the storage wall and corresponding additional overhead caused by storage-computing heterogeneity; the elimination of the storage wall can greatly reduce data handling. , not only improves the data transmission and processing speed, but also improves the energy efficiency ratio several times.

On the one hand, the power consumption required for processing the same computing power between the storage-computing integrated architecture and the traditional architecture processor will be reduced; Open the compilation wall of the traditional architecture.

Scholars at Arizona State University released a Chiplet-based IMC architecture benchmark simulator SIAM in 2021 to evaluate the potential of this new architecture in AI large model training. SIAM integrates device, circuit, architecture, network-on-chip (NoC), network-in-package (NoP), and DRAM access models to enable an end-to-end high-performance computing system. SIAM is scalable in supporting deep neural networks (DNNs) and can be customized for various network structures and configurations. Its research team demonstrates the flexibility, scalability, and simulation speed of SIAM by benchmarking different advanced DNNs using CIFAR-10, CIFAR-100, and ImageNet datasets. It is said that compared to NVIDIA V100 and T4 GPU, the chiplet +IMC architecture obtained through SIAM shows that the energy efficiency of ResNet-50 on the ImageNet dataset has increased by 130 and 72, respectively.

This means that the storage-computing integrated AI chip is expected to achieve heterogeneous integration with the help of Chiplet technology and 2.5D / 3D stack packaging technology, thereby forming a large-scale computing system. The combination of storage and calculation + Chiplet seems to be a feasible way to realize it. It is said that Yizhu Technology is exploring this road. Its first-generation storage and calculation integrated AI large computing power commercial chip can achieve a single card computing power of more than 500T. Within 75W. Maybe this will start the prelude to the second growth curve of AI computing power.

Conclusion

At the World Artificial Intelligence Conference, AMD CEO Lisa Su said that there will be a large-scale computing supercycle in the next ten years. Therefore, it is a good time to become a technology supplier, and it is also different from some companies that will use these technologies to develop different technologies. A good time to work with clients of the app.

No one wants an industry with only one dominant player. Can the large-scale model market enable the chip industry to have a new market structure, and can new players emerge?

"The large model market has brought new market patterns and opportunities to the chip industry. By promoting the development of AI chips, promoting the growth of cloud computing and data center markets, and triggering changes in the competitive landscape, the rise of large models has brought new opportunities to the chip industry. direction of development.

It should be noted that the chip industry is a highly competitive and technology-intensive industry. Entering the industry requires substantial financial and technical resources to meet complex manufacturing and R&D requirements. Although the large-scale model market provides opportunities for new players, they need to overcome technical, financial and marketing challenges to succeed in the highly competitive chip industry. "Chatgpt responded.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

1 Likes

Reward
1
Comment
Share

Comment

0/400

No comments

Topic
GT 2025 Q2 Burn Completed
12k Popularity
Michael Saylor Hints at Buying BTC
10k Popularity
Gate Alpha Trading Share
10k Popularity
4Dr.Han Joins Gate Square
49k Popularity
5BTC Back Above 110K
5k Popularity
6Trump–Musk Rift
33k Popularity
7Solana Staking ETF
22k Popularity
8Trump’s Tax Reform
34k Popularity
9Gate Square Creator Spark Program
150k Popularity

sitemap