📢 Gate Square #Creator Campaign Phase 1# is now live – support the launch of the PUMP token sale!
The viral Solana-based project Pump.Fun ($PUMP) is now live on Gate for public sale!
Join the Gate Square Creator Campaign, unleash your content power, and earn rewards!
📅 Campaign Period: July 11, 18:00 – July 15, 22:00 (UTC+8)
🎁 Total Prize Pool: $500 token rewards
✅ Event 1: Create & Post – Win Content Rewards
📅 Timeframe: July 12, 22:00 – July 15, 22:00 (UTC+8)
📌 How to Join:
Post original content about the PUMP project on Gate Square:
Minimum 100 words
Include hashtags: #Creator Campaign
Inequality in the AI model: Chinese training costs twice as much as English!
Source: Ifanr
Author: Mo Chongyu
Recently, X (formerly Twitter) user @Dylan Patel showed a study from Oxford University: By studying the language of GPT-4 and most other common LLMs, the study found that the cost of LLM (Large Language Model) inference is very different. big.
Among them, English input and output are much cheaper than other languages. The cost of Simplified Chinese is about 2 times that of English, the cost of Spanish is 1.5 times that of English, and the cost of Burmese Shan is 15 times that of English.
The principle can be traced back to a paper published by Oxford University on arXiv in May this year.
Undoubtedly, under the trend of commercialization of generative AI, the cost of computing power will also be grafted on to users. Many current AI services are billed according to the number of words that need to be processed.
The paper shows that after analyzing 17 lemmatization methods, the researchers found that the length of the same text is converted into lemma sequences in different languages. The length is totally fair.
For example, according to OpenAI's GPT3 tokenizer, if you tokenize "your love", only two tokens are needed in English, while eight tokens are required in Simplified Chinese. Even though Simplified Chinese text has only 4 characters and English text has 14 characters.
There are many similar situations. Aleksandar Petrov's website provides many related icons and data. Interested friends may wish to click "Enter to view the differences between languages.
There is also a similar page on OpenAI's official website, explaining how the API lemmatizes a piece of text, and displays the total number of tokens in the text. The official website also mentions that a lemma usually corresponds to about 4 characters in an English text, and 100 lemmas equal about 75 words.
Among other things, this difference in token sequence length can lead to unfair processing latency (some languages take more time to process the same content) and unfair modeling of long sequence dependencies (some languages can only process shorter text).
To put it simply, users of certain languages need to pay higher costs, suffer greater delays, and obtain poorer performance, thereby reducing their fair access to language technology opportunities, which indirectly leads to English-speaking users and An AI divide forms between the rest of the world's language usage.
From the cost of output alone, the cost of Simplified Chinese is twice that of English. With the in-depth development of the AI field, Simplified Chinese, which is always "one step away", is obviously not friendly. Under the balance of superimposed factors such as cost, non-English-speaking countries are also trying to develop their own native language models.
Subsequently, batches of excellent large-scale models, such as Alibaba's Tongyi Qianwen large-scale model and Huawei's Pangu large-scale model, emerged one after another.
Among them, the NLP large model in Huawei's Pangu large model is the industry's first Chinese large model with 100 billion parameters, which has 110 billion dense parameters and is trained with 40TB of massive data.
As the Deputy Secretary-General of the United Nations, Amina Mohamed, once warned at the UN General Assembly, if the international community does not act decisively, the digital divide will become "the new face of inequality".
In the same way, with the rapid development of generative AI, the AI gap is likely to become a new round of "new faces of inequality" worthy of attention.
Fortunately, the domestic technology giants that are usually "disgusted" have already taken action.