🎉 The #CandyDrop Futures Challenge is live — join now to share a 6 BTC prize pool!
📢 Post your futures trading experience on Gate Square with the event hashtag — $25 × 20 rewards are waiting!
🎁 $500 in futures trial vouchers up for grabs — 20 standout posts will win!
📅 Event Period: August 1, 2025, 15:00 – August 15, 2025, 19:00 (UTC+8)
👉 Event Link: https://www.gate.com/candy-drop/detail/BTC-98
Dare to trade. Dare to win.
Wang Xiaochuan announced the latest large model, which is claimed to be the longest in the world, 14 times that of GPT-4
Original source: Titanium Media
Author: Lin Zhijia
The competition of domestic large-scale model technology has accelerated, and after the launch of the latest products by iFLYTEK Xinghuo and Zhipu, Baichuan has also ushered in new large-scale model achievements.
Titanium Media learnedOn the morning of October 30, the AI large model company "Baichuan Intelligence" founded by Wang Xiaochuan announced the launch of the Baichuan2-192K large model, which has a context window length of up to 192K and can process about 350,000 Chinese characters.
**Baichuan Intelligence said that Baichuan2-192K is currently the longest context window in the world, and it is also 4.4 times that of Claude2, the best large model that currently supports long context windows (supports 100K context windows, measured about 80,000 words), and is 14 times (1400%) of GPT-4 (supports 32K context windows, measured about 25,000 words). **This not only surpasses Claude2 in the context window length, but also leads Claude2 in the quality of long window text generation, long context understanding, and long text Q&A, summarization, etc.
It is reported that Baichuan2-192K will be provided to enterprise users in the form of API calls and privatized deployment. At present, Baichuan Intelligent has launched the API internal testing of the large model, and opened it to core partners in the legal, media, financial and other industries.
In the past 200 days, Baichuan Intelligent has released a large model every 28 days on average, and has continuously Baichuan-7B/13B, Baichuan2-7B/13B four open source free commercial large models and Baichuan-53B, Baichuan2-53B two closed-source large models, in the field of writing, text creation and other fields of ability has reached a good level in the industry. At present, the two open source models of Baichuan-7B and 13B are among the best in many authoritative evaluation lists, with a cumulative download of more than 6 million times.
As for the company that builds large AI models, Wang Xiaochuan has said that his team's existing technical tools can be used to build large models, and the company's competitors are the open source solutions of large companies. Wang Xiaochuan also believes that the whole team does not need to be too big, and 100 people are enough.
On August 31, Baichuan Intelligent took the lead in passing the national "Interim Measures for the Management of Generative Artificial Intelligence Services" for the record, and was the only large-scale model start-up established this year among the first eight companies, and opened the Baichuan2-53B API interface on September 25, officially entering the To B enterprise field and starting the commercialization process.
On October 17, Baichuan Intelligent announced that it had completed the A1 round of strategic financing of 300 million US dollars, and Alibaba, Tencent, Xiaomi and other technology giants and a number of top investment institutions participated in this round. With the addition of the angel round of 50 million US dollars, the cumulative financing amount of Baichuan Intelligent has reached 350 million US dollars (about 2.543 billion yuan).
Baichuan Intelligent did not disclose the current specific valuation, only saying that after this round of financing, the company has become a technology unicorn. According to the general definition, the valuation of unicorns is more than 1 billion US dollars (about 7.266 billion yuan).
**In the release of Baichuan2-192K, Baichuan Intelligent said that it performed well in 10 Chinese and English long-text Q&A and abstract evaluation sets, such as Dureader, NarrativeQA, LSHT, and TriviaQA, and 7 of them achieved SOTA, significantly surpassing other long-window models and leading Claude2 in an all-round way. **
In terms of algorithms, Baichuan Intelligent proposes an extrapolation scheme for RoPE and ALiBi dynamic position coding, which enhances the modeling ability of the model to rely on long sequences while ensuring the resolution, and when the window length expands, the sequence modeling ability of Baichuan2-192K continues to increase. In terms of engineering, on the basis of the self-developed distributed training framework, Baichuan Intelligent integrates and optimizes multiple technologies and creates a comprehensive set of 4D parallel distributed solutions, which can automatically find the most suitable distributed strategy according to the specific load of the model, which greatly reduces the memory occupation in the process of long-window training and inference.
Baichuan2-192K can be deeply integrated with more vertical scenarios, truly play a role in people's work, life, and learning, and help industry users better reduce costs and increase efficiency. For example, it can help fund managers summarize and interpret financial statements, analyze the company's risks and opportunities; Helping lawyers identify risks in multiple legal documents, reviewing contracts and legal documents; Help technicians read hundreds of pages of development documentation and answer technical questions; It can also help staff quickly browse a large number of papers and summarize the latest cutting-edge progress.
At present, Baichuan2-192K is open to Baichuan Intelligence's core partners in the form of API calls, and has reached cooperation with financial media and law firms, saying that it will be fully opened soon.
Wang Xiaochuan's team said that Baichuan Intelligent Baichuan2-192K innovated for long context windows in algorithms and engineering, verified the feasibility of long context windows, and opened up a new scientific research path for the performance improvement of large models. At the same time, its longer context will also lay a good technical foundation for the industry to explore cutting-edge fields such as agents and multimodal applications.