Google AI is eating everything! Crawl all public content for training AI, privacy policy has been updated

Original source: Qubit

Image source: Generated by Unbounded AI‌

From now on, every word you say publicly on the Internet may be used by Google to train AI!

That's right, after painting, written works will also be used to feed large models——

Whether it's tech blogs, code, papers, or anything you post publicly online, it can be thrown into the "Google Big Model Blender," even with copyright.

Just this week, Google updated its privacy policy to make it clear that they reserve the right to scrape all public content online to build its AI tools.

Netizens exploded immediately. Someone warns that "Google is crawling everything":

Once Google can read what you write, it means that it is their "property".

Some netizens have a more pessimistic idea:

Soon, all content producers will be AI.

So, what's up with this version of the Privacy Policy?

For training AI products such as Bard

Things have to start with Google's updated privacy policy these days.

In its latest privacy policy, Google added an AI model clause on "research and development":

Google uses information to improve our services and develop new products, features and technologies to benefit our users and the public. For example, we use public information to help train Google's AI models and build useful products and features (such as Google Translate, Bard, and Cloud AI features).

In other words, it is to use all the public information that may be collected in the training of AI-related products or functions such as Google Translate, Bard and Cloud AI.

So, what exactly does this public information include?

Such as Internet, network and other activity information, including information about search terms, applications and browser interactions with Google services, and the use of Google services on third-party websites and applications.

In other words, not only blogs and other content that have been made public before, but also Google Docs published on the Internet, or some postings containing personal information, may also be collected by Google for Large model training.

Of course, these contents are still limited to "public information" at present.

Email services like Gmail, which is provided by Google, should still not be crawled into the data.

Moreover, Google also clearly stated in its privacy policy that it can also use such personal or public information for other reasons, such as preventing security threats, information review, service maintenance, personalized advertising or laws.

But why is Google updating this policy at this juncture?

"AI is challenging text copyright"

Perhaps it is also related to the "current limiting" operation of companies such as Reddit and Twitter.

First, in April this year, Reddit announced that it would charge for companies accessing the API.

The company CEO believes that Reddit's database is very valuable, but they don't want to provide these valuable content to large technology companies for free.

Later, Twitter also began to limit the flow of Twitter with the reason of "I don't want AI companies to prostitute data".

This series of policies has a serious impact on users and third-party tools. For example, Reddit triggered a large-scale discussion board protest. Many moderators directly shut down their own forums to protest the Reddit activity. Many people are condemning, and some netizens even said that "Twitter has been killed."

But in any case, letting AI prostitute data for free is now a contradiction that cannot be ignored.

Regarding the matter of Google AI crawling data, some netizens expressed doubts:

Why before the Internet, such as search engines, also had operations such as crawling data, but people were resistant to "AI crawling".

Some netizens responded:

It is essentially a copyright issue. If you just quote copyrighted material, then you don't necessarily infringe copyright, but if you use AI to "stir and clean" copyrighted content, and this thing is legalized, then in essence copyright is dead.

It is precisely because of this that he is pessimistic about this matter:

If someone copies your blog without acknowledging the source, or uses your open source code for paid services, or uses your answers on StackOverflow as a method of answering questions, you can accept that this will happen ? Everything I did was free. But now if the AI wants me to disappear, then I will disappear.

Of course, there are also netizens who have accepted the introduction of this policy, and it is indispensable to be alert to everyone's own awareness of prevention:

Peruse the new policy and notice how much information we are leaking online.

So, what do you think about this?

Reference link: [1] [2]

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)