AI large model folding: The data indicates that the monthly income of "migrant workers" is no more than 5,000, and the unit price drops from 5 cents to 4 cents

Original source: Tech Planet

Image source: Generated by Unbounded AI

Zheng Wen still remembers that afternoon a few months ago. On that day, she made 20 cents an hour. She graduated from a junior college in Hunan and is a large model data annotator. Her daily work is not complicated - adding labels to the raw data (such as images, videos, texts, etc.) she receives.

However, large models have very high requirements for data quality. That day, a picture was revised eight times before it was approved. The entire revision process took an hour. In other words, she only made 2 cents an hour, whereas under normal circumstances she could earn 12 yuan and pull 600 boxes. "Money is not easy to make," she repeatedly emphasized.

This is the consensus of almost all data annotation practitioners. One end of the data annotation carries the monthly salary of the practitioners, who are less than 5,000 yuan. They build the cornerstone of the large model like an army of ants. On the other end is the AI dream of major Internet companies, which they hope to surpass Chat GPT 4.

Data annotation uses the most primitive piece-rate system to calculate wages, and there is no intrigue in the workplace. The only trouble is that this boring job makes it difficult for most of them to persist for three months. And, almost everyone told Planet Tech, you’d better not go.

But what they don’t know is that most of them may lose their boring jobs soon. Because those simple data annotations will be replaced by AI.

From 5 cents to 4 cents, the price plummeted

Lin Shuang made a lot of "quick money" in 2017: more than 6,000 yuan in 15 days. For Lin Shuang, who graduated from a junior college, this income is indeed considerable. At that time, people's expectations for AI were skyrocketing. Almost no one doubted its future. All investment institutions firmly believed that companies with a scale of billions, tens of billions, or even hundreds of billions could be born here.

Behind almost all AI technologies is competition among algorithms, computing power, and calculations. Huge data is the bottom layer of technical excellence. Programmers with bright backgrounds sit in offices in "Beijing, Shanghai and Guangzhou" and draw AI blueprints through code iteration algorithms, while college students, mothers, etc. are processing images, text, and voices in huge data packages in cubicles in third- and fourth-tier cities. wait.

ChatGPT is no exception. An employee of Baidu Wenxinyiyan project team said that the large model itself does not have any new technology, nor does it have high technical barriers. The key issue is the parameter barrier formed by the computing power barrier.

Data annotators in the era of large models are not particularly different from those in the past. The few differences may be a more comfortable office environment and higher requirements for annotation quality. A data annotation practitioner told Tech Planet that when they first enter the industry, they usually form a team of about 10 people, one of whom is responsible for quality inspection. If the work is not up to standard, the employee will be sent back to redo it. The quality of data determines the quality of large models.

Data migrant workers don't care about any new branches of AI technology. They care more about the unit price, because wages here are calculated on a piece-by-piece basis.

"At that time, when the unit price was high, a 2D frame would cost more than 1 cent. At my peak, I worked for more than 10 hours and earned more than 600 yuan a day," Lin Shuang recalled. However, this is not the highest. One annotator said that the price of early 2D frame drawings could reach up to 50 cents.

Frame drawing is a common operation in data annotation. The annotator marks the objects in the picture, such as vehicles, red street lights, obstacles, etc., according to the requirements. Frames are divided into 2D and 3D, the latter will be more expensive.

But this popularity did not last long. With the influx of more and more people and the overall development of the AI industry not going smoothly, the unit price of annotating a picture is getting lower and lower. Lin Shuang said that the lowest price now is only 4 cents.

"If it's a pull-frame, the average unit price in the industry is around 0.15 yuan, but it still depends on the project. If you can receive orders, the minimum requirement for receiving a first-hand order should be 100 employees. The scale is quite large, and 3D The frame may cost 30 cents a piece, but it is rare to get it as high as 50 cents.”

Of course, if you have professional knowledge in medical and financial fields, the unit price will be higher. For example, many large medical models require annotators to have clinical expertise and relevant experience.

The monthly income of most practitioners is no more than 5,000 yuan, and there are also a few lucky ones among them. Yang Shuo originally ran a clothing store in Sichuan, but the epidemic affected his business. He transitioned to large-scale model data annotation this year. Now, he has an income of 8,000 yuan per month. “I signed a contract with the company and paid The franchise fee is 9,500 yuan, and the contract states that the minimum monthly income is 7,000 yuan.”

Who made the money

Internet giants such as Alibaba, Tencent, and Byte, as well as car companies such as SAIC and Lynk & Co, are the sources of data annotation business distribution. If you want to obtain orders directly from the source at the best price, data annotation companies need to have a certain scale. .

An employee of a data annotation company told Tech Planet that they get orders directly from large manufacturers, but the large manufacturers require them to have 500 people, so they will choose to meet the personnel requirements through franchising or subsidiaries.

The difference between the two is that franchising is suitable for people who are new to the industry to set up a studio. If you want to set up a subsidiary, there is generally only one in a region. Xiaobai Studio needs to charge a franchise fee, which is 25,000 or 30,000. The subsidiary is the exclusive agent in a region and needs to pay a fee of 50,000. And they can guarantee sufficient orders within three years and be responsible for technical training within three years. These studios or subsidiaries form a large labor union, ranging from several hundred to several thousand.

Employees of the above-mentioned data annotation company said that the popularity of large models has once again pushed the data annotation industry into a craze, and now people visit their company almost every day.

But in fact, running a data labeling company is not easy. What the data annotation company tells you is that this industry is difficult to do in the first 1 to 2 months because employees need a ramp-up period. In the early stage, only 5-8 people are enough, and even an aunt in her 40s will have no problem.

Stability is the most important factor for a data annotation company or studio. However, most of the annotation employees that Tech Planet comes into contact with often leave their jobs at the speed of light within 3 months due to boredom. New employees are not immediately available for practical operations. The result of high staff turnover is that the quality and cycle of data annotation are not stable enough. . Moms who are short of money are the most popular people for data annotation studios.

"It's definitely not possible to find a part-time job. There will be gaps. If you invest in rent and computers, you will lose money. The best way is to have all employees working," Wei Ming, who has run a data annotation studio, told Tech Planet.

Most of the data indicates that the company's repayment cycle starts at 3 months and can be up to half a year, but they need to pay their employees monthly, which requires a certain degree of capital reserves. "3500 per person, 100 people, 3 months is 1.05 million."

Zhang Jian once joined a union with more than 200 employees. In the first year, they caught up with the explosive period of the industry, and the unit price of 2D frame drawing was as high as 5 cents. That year, his union earned more than 4 million.

But the next year, things took a turn for the worse. The marked unit price became lower, employees became more mobile, and the gap period increased. In addition, two major projects were not settled. After a whole year, they lost more than 3 million yuan. “Bosses have said that they will not touch data annotation in the short term,” Zhang Jian said. “They are currently in a lawsuit with the upstream.”

This is a low-margin business. Haitian Ruisheng is the first main board-listed company in the data annotation industry. Last year, the company had revenue of 263 million yuan, profit of only 29.45 million yuan, and net profit margin of just over 10%. But in the first half of this year, the company fell into losses due to a decline in the number of customers.

"Screws" that may be replaced at any time

Relying on the accumulation of ants moving in Kenya, OpenAI finally stood out with its large-scale language dialogue model capabilities. These ordinary people, called data workers, support the AI dream of Sam Altman (the founder of OpenAI), but if nothing else happens, most of the work in their hands will soon be replaced by the new products they participated in creating. replaced.

Abroad, Anthropic, established in 2021 by former employees of Open AI, has raised US$5.15 billion this year, more than seven times its total financing in the past two years. The company offers a new way to train models with less human involvement.

This year, AI startup refuel launched an open source tool called Autolabel, which can use mainstream large models on the market to label data sets. The company's test results stated that Autolabel's labeling efficiency is 100 times higher than manual labeling, and the cost is only 1/7 of the labor cost.

In China, a company called Vision Future is also building large-scale annotation models. In an interview, they said that some projects have been delivered using GPT, and the accuracy has reached more than 80%, which is close to manual work.

However, Haitian Ruisheng believes that AI will not achieve completely automated annotation, because if the machine wants to continue to evolve and make it closer to human judgment and understanding, it will definitely need human guidance.

Almost everyone who has been engaged in data annotation revealed the same point of view to Tech Planet: Data annotation is a job with no threshold and only requires you to be proficient in using computers.

But in fact, if simple annotation can be completed with AI, then manual participation will be more difficult data screening and standard work, which also means that the threshold of the industry will continue to increase, especially ChatGPT, Wen Xinyiyan A large language model for classes.

As a comparison, long before ChatGPT became popular, OpenAI organized more than a dozen doctoral students to "mark". Baidu's data annotation base in Haikou has hundreds of full-time large model data annotators, and the undergraduate rate of annotators reaches 100%.

The characteristic of this type of large language model is that the annotator needs to have a certain knowledge reserve and logical analysis ability. According to the "Financial Eleven" report, annotators need to determine the type of question, and then score and rank the five answers respectively. The score range is 0-5 points. If the score is lower than 3 points, the specific reasons must be noted, such as "The answer is not what the question was asked (0 points)", "seriously off-topic (1 point)", "there are logical problems and factual errors, and the proportion is small and 2 points are given", etc.

Another popular area of data annotation is autonomous driving. According to a Deloitte report, labeling demand in the autonomous driving field will account for 38% of all AI downstream applications in 2022, and it is expected that the proportion will rise to 52% by 2027. Compared with large language models, for models in the field of autonomous driving, those simple box-pulling operations still have relatively loose academic requirements.

Annotators are the cornerstone of mankind from the mobile Internet era to the artificial intelligence era. Most of the practitioners Tech Planet has come into contact with do not know the changes that AI will bring to them, nor the contributions they have made to the development of AI. They They are just a new generation of screws in the Internet era, and they may be replaced at any time.

(Note: The characters in the article are all pseudonyms.)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)