AI big model will revolutionize AI

2023-08-14 05:47:12

Source: Economic Observer

Author: Shen Yiran

Image source: Generated by Unbounded AI

In April of this year, several researchers in a leading artificial intelligence company paid attention to a new technology: SAM (Segment Anything Model). The researchers quickly reported this technology to the head of the department. This company started with machine vision technology, and the technology the researchers focused on was also related to this. "With the advent of SAM, more and more AI people realize that big models are a shock to them," said one of the researchers.

A month later, the company began to allocate resources to develop a large visual model.

In the following three months, leading machine vision AI companies paid attention to the potential of this technology. So far, artificial intelligence companies such as SenseTime and CloudWalk Technology, as well as traditional security companies, have begun to invest in this new technology competition.

SAM is an image segmentation model for general scenes. It was launched by Meta in April this year. Just like talking to ChatGPT, humans can use some language instructions to let SAM independently distinguish and think about the content in the picture. SAM is considered to be ChatGPT appeared in the field of vision.

Enthusiasts all over the world use it to draw pictures, cut pictures, and have a great time, but Chinese researchers have recognized the power of SAM: if it is used in automatic driving, security monitoring, to detect people, cars and roads, it is a self-contained A large model that fundamentally breaks the traditional machine vision gameplay.

Segmenting and recognizing images is a core task of machine vision. In the past, each task of creating a segmented image required training an algorithm, annotating a batch of data, and allowing the machine to "see" various objects in the image by superimposing small models. The SAM has shown some new features: without creating a small model for each specific task, the machine can autonomously segment any object in any image, even an unknown, blurry scene, and the operation is extremely simple.

This means that SAM has more general features, and it is possible to use this general feature to greatly reduce the cost of machine vision recognition, thereby changing the business model and competition pattern based on the original technology.

Since 2016, hundreds of artificial intelligence companies have emerged in China, which has a huge market. With the help of market competition and capital, several AI unicorns have gradually formed, such as Shangtang Technology, Cloudwalk Technology, Megvii Technology, According to Yitu Technology, these companies have brought AI into the fields of security, government affairs, and industry, and built a moat by taking advantage of the sophistication of algorithms and the advantages of scale.

But now, with the change of technology, the event may be restarted.

Feng Junlan, Chief Scientist of China Mobile Group and Vice Chairman of China Artificial Intelligence Industry Development Alliance, told reporters that the AI large model will bring a new artificial intelligence paradigm. The so-called moat in the field of AI in the past basically does not exist under the impact of the large model. . The emergence of SAM proves the feasibility of large visual models, subverting the research framework, interaction and production service methods of machine vision.

Luo Xun, a senior member of IEEE, a professor at Tianjin University of Technology, and an expert in AR/VR technology, told reporters that the advantages of AI capabilities of leading companies before will be weakened to a certain extent due to the rise of general-purpose large models. But whether these companies themselves will become weaker depends on their transformation.

Technical route

As an important branch of AI, the goal of machine vision is to allow computers to imitate the human visual system to understand and process images and videos.

After 2000, Geoffrey Hinton, Yann LeCun, and Yoshua Bengio, known as the founders of artificial intelligence, broke through deep learning technology, allowing machines to vaguely simulate the human brain, and automatically learn and extract features from massive images.

2012 is an important time node. The ImageNet project created by Stanford University professor Li Feifei pushed deep learning into the mainstream: researchers can teach computers to recognize various objects by manually labeling a large number of pictures, which greatly improves machine vision. The accuracy rate reduces the cost and makes it possible to commercialize it.

In April 2023, new changes came, and Meta launched an image segmentation model called SAM. As a large model, SAM not only equips the machine with eyes to perceive the outside world, but also endows the machine with a real brain. It learns to observe, perceive, think, logically reason, and draw results from images, and the operation is extremely simple, similar to ChatGPT uses human language dialogue to give machine commands.

In short, it achieves the goal of machine vision more easily, without the need for a large number of image annotations and stacking algorithms, and consumes less computing power. Nvidia artificial intelligence scientist Jim Fan said that the SAM large model is the GPT-3 moment of machine vision. It has understood the general concept of objects, even for unknown objects, unfamiliar scenes (such as underwater images), and in ambiguous situations Image segmentation is also possible.

After Meta released SAM, it also open-sourced the model and the training data set behind it, and introduced the application scenarios of SAM from AR, VR, content creation and other fields.

Enterprises and researchers in China quickly judged the possible commercial value of SAM. If it is used in autonomous driving, security monitoring, to detect people, cars and roads, it can fundamentally break the traditional machine vision.

Feng Junlan said that the large model will change the supply mode of AI, greatly reduce the complexity of the supply side, and the marginal cost is close to zero; the business side can express demand in simpler natural language, and no longer need to rely on professional instructions such as codes by engineers Communicate with machines and flexibly deploy to different models according to their own needs, improving efficiency

Zhu Bing, chief product officer of Uniview Technology, told reporters, "In the past, doing AI work was like carrying boxes. In fact, it was some relatively low-tech physical work. When AI empowers a single-point scene, it is very fragmented. And customized ones, the pre-sales efficiency, after-sales efficiency, and sales efficiency are all low, and the upstream and downstream of the industry are more painful.” For example, Zhu Bing said that the investment and cost of manufacturers investing in development, collecting materials, calibrating, and customizing algorithms for different scenarios and regions is very large. For customers, the custom development fee is also a considerable expense.

Today, using a large model to replace the original small model gameplay does not require stacking algorithms or a large amount of labeled data, and consumes very little computing power in the process. You can use a simpler human language to give commands to the machine without using a professional computer programming language. . Zhu Bing said that the large model has greatly reduced the cost of AI research and development and deployment. It has built a series of new gameplays and restructured the industry order, especially in the computer vision industry. The previous technical barriers constructed by large companies have been smoothed out. , everyone returned to the same starting line.

Influx

Around the previous generation of machine vision technology, a number of artificial intelligence companies were born in China, and the technologies provided by these companies began to be widely used in camera monitoring and security inspection identification for public security, subways, and commercial buildings.

"AI Four Tigers" refers to four Chinese artificial intelligence companies that were successively established between 2011 and 2014, namely SenseTime, Cloudwalk Technology, Megvii Technology, and Yitu Technology. Their common feature is machine vision as the core technology. The breakthrough of AI in the deep learning route has provided a technical foundation for the rise of this group of artificial intelligence companies, and China's industrial advantages have provided a market for the development of these companies.

After SAM came out, they began to target this technology one after another.

The reporter learned from many people in the industry that, in addition to Yitu Technology, SenseTime, Cloudwalk Technology, and Megvii Technology among the "AI Four Tigers" are all developing large-scale visual models. Kangweishi and Uniview Technology also deploy related technology research and development.

In April, just a few days after Meta launched SAM, SenseTime released the "Daily New" large model. Tian Feng, dean of SenseTime Intelligent Industry Research Institute, told reporters that the "Ri Ri Xin" series is a collection of multiple large models including natural language generation, image generation, and visual perception. Among them, "Ruying", "Qiongyu", "Gewu" are large models related to vision.

In May, Yuncong Technology released the "calm" large model, which is a multi-modal large model including vision. Yuncong Technology stated at the recent investor meeting that the visual large model is very important and will be launched in the future Visually-led models. Because the company has a strong reserve in computer vision, and because it needs multi-modal technology to solve customers' specific business.

Megvii and Yitu have yet to launch large models. Megvii told reporters that it is "developing a large model, but it has not been launched and delivered to customers." In terms of direction, Megvii has selected four research directions: general image large model, video understanding large model, computational photography large model, and autonomous driving perception large model, and has achieved certain breakthroughs.

Su Lianjie, the chief analyst of artificial intelligence at the research institution Omdia, told reporters that under the impact of the visual large-scale model, the "AI Four Tigers" quickly transformed into a large-scale model and deployed a multi-modal large-scale model that focuses on vision. relatively reasonable.

Hikvision told investors in June this year, "We paid attention to the SAM model at the beginning of its release and conducted a systematic evaluation." Zhu Bing told reporters that the AIoT industry model that the company is developing by itself " "Wutong" is a large-scale industry model based on a general large-scale model + industry scene + training and tuning. It was released for the first time on May 9 and has been tested by the first batch of partners in June.

Hikvision and Uniview Technology are traditional security companies that started out as equipment manufacturers. They faced fierce competition after the "AI Four Tigers" entered the security industry. They have been actively embracing machine vision technology. market share.

At present, AI companies are beginning to reach a consensus on the meaning of "the epoch-making of large models".

Tian Feng, dean of SenseTime Intelligent Industry Research Institute, and Yao Zhiqiang, co-founder of Yuncong Technology, both told reporters that AI1.0 is the era of small models. Enterprises mainly provide proprietary small models and use multi-point technology to solve specific scene needs. ; AI2.0 is the era of large models. Enterprises need to use a unified large-scale technology base platform, that is, to create a multi-modal basic model with general perception and cognition capabilities for the world, and generate a series of industries on this basis. Small models to meet the needs of professional scenes and more massive scenes.

Yao Zhiqiang believes that if an AI company is still at the previous stage, it may be able to solve many scene problems, but the cost is difficult to reduce, making the scale effect impossible to show; Tian Feng believes that the two eras coexist for a long time, and it is not whoever eliminates the other. The opposite relationship, the two are completed in a coordinated manner. For example, using the hybrid expert model structure (MoE), in the AI2.0 era, multiple models are combined into services, and 1.0 models can also be embedded.

In the new competition, the original technology accumulation and hardware investment will still play a role.

Tian Feng told reporters that the "AI Large Device" intelligent computing center has powerful AI computing power and can provide training computing power for 20 large models with hundreds of billions of parameters. It is the key equipment for developing and training large models. SenseTime Not only for personal use, but also open to large-scale startups and R&D partners.

The relevant person in charge of Yunwalk told reporters that the company's CWOS operating system has inherent advantages in integrating super language models such as ChatGPT. At the same time, the system can feed back data and information to the large model according to the actual production situation, optimize the training and adjustment of the model, and improve the accuracy and efficiency of the model.

Large model breaks through the market

"Even without the impact of the big model, the "AI Four Tigers" are still in a period of confusion in transformation, and need to think about their own value and way out." Su Lianjie said.

A group of artificial intelligence companies have been favored by capital and the market, among which SenseTime and CloudWalk have landed in the capital market. From 2018 to 2022, SenseTime has invested more than 12 billion yuan in research and development each year, and raised more than 5 billion yuan in its IPO in 2021. From 2018 to 2022, Yuncong has invested more than 2.2 billion yuan in research and development each year, and will raise 1.7 billion yuan in its IPO in 2022.

The good interaction between technology and capital has also given China a leading advantage in the field of visual recognition. Around 2018, China was second only to the United States or surpassed the United States in terms of the number of artificial intelligence papers published and the amount of artificial intelligence financing. Especially in the field of visual recognition, Chinese artificial intelligence companies have repeatedly broken records in international competitions and achieved excellent results.

But soon, with the promotion of the market, the potential of the original technology gradually peaked. In 2019, Zhang Bo, an academician of the Chinese Academy of Sciences, suggested in an exclusive interview with the Economic Observer that the potential of industrial applications may have been touched on the existing technology route. to the ceiling.

More importantly, from a commercial point of view, the original technical route of AI has always been difficult to break through the cost bottleneck, so that more traditional industry customers are unable to pay the bill. Zhu Bing said, "For many years, we have not seen a vigorous new order. A large number of companies are ruthlessly competing in the two tracks of human and license plate recognition. The fundamental reason is that more algorithms cannot form a scale effect."

An AI researcher of a leading company told reporters that according to the traditional method, an AI company serves a car factory and sells a set of algorithms for identifying roadblocks. The average single algorithm for identifying a roadblock costs more than 100,000 yuan and takes about 2 months. , the customer needs to provide tens of thousands of pictures for labeling, but only one algorithm is not enough, the actual road scene is very complex, the algorithm that is suitable for small cars may not be suitable for large trucks, and it cannot be recognized from another angle. It is also difficult to recognize when the detection target is partially occluded.

In order to increase the intelligence of equipment, AI companies need to superimpose multiple algorithms, which simply means stacking many small models. According to the financial report, SenseTime has accumulated 67,000 small commercial models. The reporter learned from Yuncong Technology that the company also has thousands of small commercial models.

But the time and cost of training also doubled.

Feng Junlan told reporters that it is difficult for many AI companies to make money. One important reason is the high cost of AI services, causing companies to "earn one yuan and lose five yuan", and the model of "the more orders received, the more compensation" makes it difficult for suppliers. Continued, the demand side can only be a few key industries or industries with strong payment capabilities.

According to the financial report, from 2018 to 2022, Yuncong Technology has accumulated losses of 3.1 billion yuan, and SenseTime has accumulated losses of more than 40 billion yuan.

In order to further reduce the cost of AI and improve the market, the strategy of the "AI Four Tigers" has also diverged. SenseTime chooses AI devices, Cloudwalk chooses operating systems, Megvii chooses chips, and YITU chooses IoT.

From this perspective, the big model may bring not only challenges to existing companies, but also a brand new business model and application scenario.

The above-mentioned researcher said that the company has tried hard to find AI business in more markets. For example, the company once talked with a supermarket about AI monitoring to detect whether the salesperson was present. The company sent five algorithm engineers, and the salary alone cost 300,000 yuan. The total monthly salary of the client’s dozens of salespersons is less than 50,000 yuan; he also talked to the factory owner about AI quality inspection, which detects whether the packaging boxes on the assembly line are damaged, and the other party evaluates that it is more economical to hire workers, etc.

These requirements are collectively referred to as the long-tail requirements of AI: a large number of small and medium-sized customers, with weak payment capabilities, have no rigid demand for AI, but have some special needs in certain scenarios, which can be used or not, and they are unwilling to pay millions of dollars. . In the view of this researcher, in the future, a certain type of large model or a set of multi-modal large models can be applied to these visual detection scenarios, using the migration and general capabilities of large models, only a small amount of data labeling and algorithm investment are needed, and the development The cycle and the requirements for computing power will also be lower, so that the cost will be greatly reduced, and customers will be more likely to pay.

Zhu Bing has estimated that in the past, AI algorithms based on small models could meet less than 10% of fragmentation requirements. In the future, the probability of AI algorithms based on large models can be increased to more than 50%, and the efficiency of the overall long-tail algorithm can be increased by 10 times. The time can be reduced to within 1 person week.

Yao Zhiqiang told reporters that once the technology is platformized and standardized, all AI companies can quickly adapt to massive scenarios and realize massive applications through a unified core technology base platform.

Feng Junlan said that the cost of technology consumption is far less than the value that technology brings to business. When this formula is satisfied, technology can be scaled up and migrated to more and longer-tailed markets. This also satisfies the fundamental logic for AI companies to achieve profitability, and also means that they have the opportunity to develop more blue ocean markets.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

1 Likes

Reward
1
Comment
Share

Comment

0/400

No comments

Topic
Gate 2025 Q2 Report Released
24k Popularity
CPI Data Incoming
56k Popularity
Altcoin Season Update
6k Popularity
4Bitcoin Whale Moves
799 Popularity
5Gate Derivatives Volume Hits New High
16k Popularity
6Crypto Legislation Voting Week
5k Popularity
7MicroStrategy Buys More Bitcoin
742 Popularity
8BTC Hits New High
112k Popularity
9My Gate Moments
26k Popularity
10VIP Exclusive Airdrop Carnival
26k Popularity

sitemap