Involved in the big model, the new narrative of mobile phone manufacturers

2023-08-16 02:28:43

Author｜Wu Jingjing

Edit｜Chestnuts

Source: Jiazi Guangnian

Xiaomi's large model made its public debut during Lei Jun's 2023 annual speech.

Lei Jun mentioned that, unlike many Internet platforms, the key breakthrough direction of Xiaomi's large model is light weight and local deployment, which can run smoothly on the mobile phone side.

He said that at present, the MiLM1.3B model with a scale of 1.3 billion parameters has been run on mobile phones, and the effect is comparable to the results of large models with 6 billion parameters in cloud computing. In the report card he posted, the large Xiaomi end-to-side model performed better than the ChatGLM2-6B model of Zhipu AI in various topics of the CMMLU Chinese assessment, and the score gap with the Baichuan-13B large model of Baichuan Intelligent was about About 5 minutes.

(Source: Xiaomi)

Previously, the large-scale pre-training language model MiLM-6B/1.3B developed by Xiaomi has landed on GitHub, the code hosting platform, and ranked tenth in the C-general list, ranking first in the same parameter magnitude, and ranked first in the Chinese large model benchmark "CMMLU ", "MiLM-6B" ranked first.

Of course, since the dimensions of these test lists are all public, it is not difficult for many large-scale model companies to score according to the test tasks. Therefore, these evaluation results can only be used as a reference, and do not mean that the results are absolutely excellent.

At the same time, Lei Jun also announced that Xiao Ai, as the first application business of Xiaomi's large model, has undergone a new upgrade and officially opened the invitation test.

This is the staged large-scale model achievement made by Xiaomi in four months since the announcement of the new large-scale model team in April this year.

What new thinking does Xiaomi's practice bring to the implementation of large models? What does it mean for mobile phone manufacturers that rely on new technology iterations?

1. Xiaomi does not make general-purpose large models, and the core team has about 30 people

Xiaomi belongs to the rational school on the route of large-scale models-** does not pursue the scale of parameters, and does not make general-purpose large-scale models. **

Earlier in the earnings conference call, Lu Weibing, president of Xiaomi Group, said that Xiaomi will actively embrace large-scale models, and the direction is to deeply integrate products and businesses, and will not make general-purpose large-scale models like OpenAI.

According to Shenran’s previous reports, Dr. Wang Bin, director of the Xiaomi Group’s AI Lab, once said that Xiaomi will not release a ChatGPT-like product alone, and the self-developed large model will eventually be brought out by the product, and the relevant government will invest tens of millions of RMB level. **

He said: "For large models, we belong to the rational school. Xiaomi has advantages in application scenarios, and what we see is a huge opportunity for the combination of large models and scenarios."

He revealed that before the birth of ChatGPT, Xiaomi had done internal research and development and application of large models. At that time, it used pre-training + downstream task supervision and fine-tuning to conduct human-machine dialogue, with a parameter scale of 2.8 billion to 3 billion. This is mainly achieved through the fine-tuning of dialogue data on the basis of the pre-trained base model, not the general-purpose large model as it is now called.

According to public information, the current head of Xiaomi’s large model team is Luan Jian, an expert in the direction of AI voice, reporting to Wang Bin, vice chairman of the technical committee and director of the AI laboratory. The entire large model team has about 30 people.

Luan Jian used to be the chief voice scientist and voice team leader of the intelligent voice robot "Microsoft Xiaoice", a researcher at Toshiba (China) Research Institute, and a senior voice scientist at Microsoft (China) Academy of Engineering. After joining Xiaomi, Luan Jian was in charge of voice generation, NLP and other teams successively, and the implementation of related technologies in products such as Xiao Ai. Wang Bin joined Xiaomi in 2018 and has been in charge of the AI laboratory since 2019. Before joining Xiaomi, he was a researcher and doctoral supervisor at the Institute of Information Engineering, Chinese Academy of Sciences. He has nearly 30 years of research experience in the fields of information retrieval and natural language processing.

The large-scale model also relies on the AI team behind Xiaomi. Lei Jun said that after 7 years and 6 expansions, Xiaomi’s AI team has more than 3,000 people, covering CV, NLP, AI imaging, autonomous driving, robotics and other fields. .

(Source: Xiaomi)

2. Google, Qualcomm, and Huawei have entered the game one after another

In addition to Xiaomi, making large models run on mobile phones is the current key goal of many technology companies.

Technology companies are imagining the possibility of large models: no matter what you open is WPS, graphite documents or emails, as long as you enter commands such as writing, the mobile phone can call local capabilities to generate a complete article or an email. On the mobile phone, all Apps can call the local large-scale model at any time to help deal with work and solve life problems. The interaction between people and various Apps on the mobile phone is no longer frequent clicks, but can be intelligently called by voice.

Many companies are trying to compress the size of the model, making it more practical and economical to run large models locally on mobile phones. At the Google I/O conference in May this year, when Google released PaLM2, it was divided into four specifications according to the size, from small to large in order of Gecko, Otter, Bison and Unicorn. Among them, the smallest Gecko can run on mobile phones. And it's fast, it can process 20 tokens per second, roughly equivalent to 16 or 17 words, and it can also support mobile phones to run offline. But at the time, Google didn't say which phone the model would be used in.

At present, it is Qualcomm who has come up with specific results. At the 2023MWC in March this year, Qualcomm ran Stable Diffusion, a Wensheng graph model with more than 1 billion parameters, on a smartphone equipped with the second-generation Snapdragon 8. In the demonstration, the staff used Stable Diffusion to generate images on an Android phone without Internet connection, and the whole process took 15 seconds.

At CVPR, the top conference of computer vision academics in June, Qualcomm demonstrated the ControlNet model running on an Android phone with a scale of 1.5 billion parameters, and the drawing time was only 11.26 seconds. Ziad Asghar, senior vice president of product management and head of AI at Qualcomm, said: **Technically, it takes less than a month to move these large models with over 1 billion parameters into mobile phones. **

The latest action is that Qualcomm announced the cooperation with Meta to explore the applications and applications based on the Llama 2 model on smartphones, PCs, AR/VR head-mounted display devices, cars and other devices based on the Qualcomm Snapdragon chip without networking. Serve. According to Qualcomm, compared with cloud-based LLM, running large-scale language models such as Llama 2 locally on the device not only has lower cost and better performance, but also does not need to connect to online services**, and the service is more personalized and more efficient. Safe and more private.

Apple, which has not officially announced any large-scale model actions, is also exploring the landing of large-scale models on the device side. According to the "Financial Times" report, Apple is fully recruiting engineers and researchers to compress large language models so that they can run efficiently on the iPhone and iPad, and the team mainly responsible is the Machine Intelligence and Neural Design (MIND) team .

At present, on Github, a popular open source model MLC LLM project can support local deployment. It solves memory constraints by carefully planning allocation and actively compressing model parameters, and can run AI models on various hardware devices such as the iPhone. The project was jointly developed by CMU Assistant Professor, OctoML CTO Chen Tianqi and other researchers. The team uses Machine Learning Compilation (MLC) technology as the basis to efficiently deploy AI models. Less than two days after MLC-LLM went online, the number of stars on GitHub has approached 1,000. Someone has tested running a large language model locally in iPhone's airplane mode.

Unlike foreign Google and Qualcomm who emphasize that large models can be deployed locally on the device side and can be run offline, domestic mobile phone manufacturers are currently giving priority to deploying large models on mobile phone voice assistants or existing image search functions. The essence of this upgrade is to call More cloud capabilities to use large models.

This time, Xiaomi used the large model on the voice assistant Xiao Ai. However, since the relevant information about the Xiaomi end-to-end large model has not been disclosed, it is impossible to accurately judge the development path of the Xiaomi large model in the future. Judging from the direction of local deployment and lightweight that Lei Jun emphasized, Xiaomi may try to run the large model offline on the mobile phone in the future.

Huawei is also trying to implement large-scale models on mobile phones, but the focus is still on mobile phone voice assistants and image search scenes. Earlier in April, on Huawei's newly released mobile phone P60, behind the new function of smart image search was the multi-modal large-scale model technology, and the model was miniaturized on the mobile phone side during the process. Recently, Huawei's newly upgraded terminal intelligent assistant Xiaoyi has also optimized the experience based on the large model, and can recommend restaurants according to voice prompts, and perform new functions such as summaries.

OPPO and vivo are also making efforts in this direction. On August 13, OPPO announced that the new Xiaobu Assistant based on AndesGPT will soon start to experience. It can be seen from the data that after Xiaobu Assistant integrates the capabilities of large models, it will be more effective in dialogue and copywriting. The ability to write and other aspects will be strengthened. AndesGPT is a generative large language model based on hybrid cloud architecture created by OPPO Andes Intelligent Cloud Team.

For mobile phone manufacturers, whether it is local deployment or invoking cloud capabilities, the large model for mobile phones is a new opportunity that cannot be missed.

3. The big model runs on the mobile phone, where is the key problem?

It is not an easy task to make a large model run on a mobile phone.

Computing power is the primary issue. Using the large model on the mobile phone requires not only calling the computing power of the cloud but also the computing power of the terminal device. Due to the large resource consumption of the large model, each call means a high cost. Alphabet chairman John Hennessy once mentioned that the search cost of using a large language model is 10 times higher than the previous keyword search cost. Last year, Google had 3.3 trillion search queries, costing about one-fifth of a cent each. Wall Street analysts predict that if Google uses large language models to handle half of its search queries, each providing an answer of about 50 words,** Google could face a $6 billion increase in spending by 2024. **

(Source: Reuters)

Running large models on the mobile phone faces similar cost problems. It is mentioned in the "Hybrid AI is the Future of AI" report released by Qualcomm, just as traditional computing has evolved from a mainframe and client to a combination of cloud and edge terminals. In the same mode, running a large model on the terminal side also requires a hybrid AI architecture to allow AI workloads to be distributed and coordinated between the cloud and edge terminals,** allowing mobile phone manufacturers to use the computing power of edge terminals to reduce costs. It is out of this cost consideration that large models can be deployed locally.

In addition, as everyone's personal belongings, the mobile phone is the place where data is generated, and a large amount of private data is also stored locally. If local deployment can be realized, it will provide protection for individuals in terms of security and privacy.

This brings up the second problem. If you want to use more end-side capabilities to run large models, how can you make the power consumption of the mobile phone very low and at the same time make the model more effective?

Qualcomm once stated that the key ability to deploy large models to local devices such as mobile phones lies in Qualcomm's full-stack AI optimization of hardware and software, including Qualcomm AI Model Enhancement Toolkit (AIMET), Qualcomm AI Engine and Qualcomm Related technologies such as the AI software stack can compress the model volume, accelerate reasoning, and reduce runtime delay and power consumption. Hou Jilei, global vice president of Qualcomm and head of Qualcomm AI research, once mentioned that an important part of Qualcomm's high-efficiency AI research and development is the overall model efficiency research. The purpose is to reduce the AI model in multiple directions so that it can run efficiently on the hardware. .

Single model compression is not a small difficulty. Some model compression will cause loss in the performance of large models, and some technical methods can achieve lossless compression, all of which require engineering attempts in different directions with the help of various tools.

These key software and hardware capabilities are a big challenge for mobile phone manufacturers. Today, many mobile phone manufacturers have taken the first step to run large models on mobile phones. Next, how to make a better large model fall into each mobile phone more economically and efficiently is a more difficult and more critical step.

The adventure has just begun.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.