Wang Xiaochuan announced the latest large model, which is claimed to be the longest in the world, 14 times that of GPT-4

Original source: Titanium Media

Author: Lin Zhijia

Image source: Generated by Unbounded AI

The competition of domestic large-scale model technology has accelerated, and after the launch of the latest products by iFLYTEK Xinghuo and Zhipu, Baichuan has also ushered in new large-scale model achievements.

Titanium Media learnedOn the morning of October 30, the AI large model company "Baichuan Intelligence" founded by Wang Xiaochuan announced the launch of the Baichuan2-192K large model, which has a context window length of up to 192K and can process about 350,000 Chinese characters.

**Baichuan Intelligence said that Baichuan2-192K is currently the longest context window in the world, and it is also 4.4 times that of Claude2, the best large model that currently supports long context windows (supports 100K context windows, measured about 80,000 words), and is 14 times (1400%) of GPT-4 (supports 32K context windows, measured about 25,000 words). **This not only surpasses Claude2 in the context window length, but also leads Claude2 in the quality of long window text generation, long context understanding, and long text Q&A, summarization, etc.

It is reported that Baichuan2-192K will be provided to enterprise users in the form of API calls and privatized deployment. At present, Baichuan Intelligent has launched the API internal testing of the large model, and opened it to core partners in the legal, media, financial and other industries.

It is reported that Baichuan Intelligence was established on April 10, 2023, by Wang Xiaochuan, founder and former CEO of Sogou. Its core team is composed of top AI talents from well-known technology companies such as Sogou, Google, Tencent, Baidu, Huawei, Microsoft, and Byte. At present, the team size of Baichuan Intelligent is more than 170 people, of which nearly 70% are employees with master's degree or above, and more than 80% are R&D personnel.

In the past 200 days, Baichuan Intelligent has released a large model every 28 days on average, and has continuously Baichuan-7B/13B, Baichuan2-7B/13B four open source free commercial large models and Baichuan-53B, Baichuan2-53B two closed-source large models, in the field of writing, text creation and other fields of ability has reached a good level in the industry. At present, the two open source models of Baichuan-7B and 13B are among the best in many authoritative evaluation lists, with a cumulative download of more than 6 million times.

As for the company that builds large AI models, Wang Xiaochuan has said that his team's existing technical tools can be used to build large models, and the company's competitors are the open source solutions of large companies. Wang Xiaochuan also believes that the whole team does not need to be too big, and 100 people are enough.

On August 31, Baichuan Intelligent took the lead in passing the national "Interim Measures for the Management of Generative Artificial Intelligence Services" for the record, and was the only large-scale model start-up established this year among the first eight companies, and opened the Baichuan2-53B API interface on September 25, officially entering the To B enterprise field and starting the commercialization process.

On October 17, Baichuan Intelligent announced that it had completed the A1 round of strategic financing of 300 million US dollars, and Alibaba, Tencent, Xiaomi and other technology giants and a number of top investment institutions participated in this round. With the addition of the angel round of 50 million US dollars, the cumulative financing amount of Baichuan Intelligent has reached 350 million US dollars (about 2.543 billion yuan).

Baichuan Intelligent did not disclose the current specific valuation, only saying that after this round of financing, the company has become a technology unicorn. According to the general definition, the valuation of unicorns is more than 1 billion US dollars (about 7.266 billion yuan).

**In the release of Baichuan2-192K, Baichuan Intelligent said that it performed well in 10 Chinese and English long-text Q&A and abstract evaluation sets, such as Dureader, NarrativeQA, LSHT, and TriviaQA, and 7 of them achieved SOTA, significantly surpassing other long-window models and leading Claude2 in an all-round way. **

Baichuan pointed out that it is the consensus of the artificial intelligence industry that expanding the context window can effectively improve the performance of large models, but the ultra-long context window means higher computing power requirements and greater memory pressure. At present, there are many ways to increase the length of the context window in the industry, including sliding windows, downsampling, small models, etc. Although these methods can increase the length of the context window, they all have varying degrees of impairment to the performance of the model, in other words, they all sacrifice the performance of other aspects of the model in exchange for a longer context window. The Baichuan2-192K released this time achieves a balance between window length and model performance through algorithm and engineering optimization, and achieves the simultaneous improvement of window length and model performance.

In terms of algorithms, Baichuan Intelligent proposes an extrapolation scheme for RoPE and ALiBi dynamic position coding, which enhances the modeling ability of the model to rely on long sequences while ensuring the resolution, and when the window length expands, the sequence modeling ability of Baichuan2-192K continues to increase. In terms of engineering, on the basis of the self-developed distributed training framework, Baichuan Intelligent integrates and optimizes multiple technologies and creates a comprehensive set of 4D parallel distributed solutions, which can automatically find the most suitable distributed strategy according to the specific load of the model, which greatly reduces the memory occupation in the process of long-window training and inference.

Baichuan2-192K can be deeply integrated with more vertical scenarios, truly play a role in people's work, life, and learning, and help industry users better reduce costs and increase efficiency. For example, it can help fund managers summarize and interpret financial statements, analyze the company's risks and opportunities; Helping lawyers identify risks in multiple legal documents, reviewing contracts and legal documents; Help technicians read hundreds of pages of development documentation and answer technical questions; It can also help staff quickly browse a large number of papers and summarize the latest cutting-edge progress.

At present, Baichuan2-192K is open to Baichuan Intelligence's core partners in the form of API calls, and has reached cooperation with financial media and law firms, saying that it will be fully opened soon.

Wang Xiaochuan's team said that Baichuan Intelligent Baichuan2-192K innovated for long context windows in algorithms and engineering, verified the feasibility of long context windows, and opened up a new scientific research path for the performance improvement of large models. At the same time, its longer context will also lay a good technical foundation for the industry to explore cutting-edge fields such as agents and multimodal applications.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)