How to trust AI: what ideas does zero-knowledge machine learning (ZKML) provide

2023-06-26 07:57:50

Summary

**As AI evolves at an unimaginable speed, it will inevitably raise concerns about the other "edge" of the AI sword - trust. **The first is privacy: in the age of AI, how can humans trust AI from the perspective of data privacy? Perhaps the transparency of the AI model is the key to worrying: the ability to emerge like a large-scale language model is tantamount to an impenetrable technological "black box" for humans, and ordinary users cannot understand how the model works and the results of the operation And how to get it - what's more troublesome is that as a user, you may not know whether the AI model provided by the service provider is working as promised. Especially when applying AI algorithms and models on some sensitive data, such as medical care, finance, Internet applications, etc., whether the AI model is biased (or even maliciously oriented), or whether the service provider runs the model (and related parameters) accurately as promised , has become the most concerned issue for users. Zero-knowledge proof technology has a targeted solution in this regard, so zero-knowledge machine learning (ZKML) has become the latest development direction.

**Comprehensive consideration of computing integrity, heuristic optimization, and privacy, the combination of zero-knowledge proof and AI, zero-knowledge machine learning (Zero-Knowledge Machine Learning, ZKML) came into being. **In the era when AI-generated content is getting closer and closer to human-generated content, the technical characteristics of zero-knowledge secret proof can help us determine that specific content is generated through a specific model. For privacy protection, zero-knowledge proof technology is particularly important, that is, the proof and verification can be completed without revealing the user data input or the specific details of the model.

**Five ways zero-knowledge proofs are applied to machine learning: computational integrity, model integrity, verification, distributed training, and authentication. **The recent rapid development of large-scale language models (LLMs) shows that these models are becoming more and more intelligent, and these models complete the important interface between algorithms and humans: language. The trend of general artificial intelligence (AGI) is already unstoppable, but judging from the current model training results, AI can perfectly imitate high-capacity humans in digital interactions—and surpass humans at an unimaginable speed in rapid evolution The level of human beings has to marvel at this evolutionary speed, and even worry about being quickly replaced by AI.

**Community developers use ZKML to verify the Twitter recommendation function, which is instructive. **Twitter's "For You" recommendation feature uses an AI recommendation algorithm to distill the approximately 500 million tweets posted each day into a handful of popular tweets, which are eventually displayed on the timeline of the user's homepage. At the end of March 2023, Twitter open-sourced the algorithm, but because the details of the model have not been made public, users still cannot verify whether the algorithm is running accurately and completely. Community developer Daniel Kang and others use cryptographic tools ZK-SNARKs to check whether the Twitter recommendation algorithm is correct and running completely without disclosing algorithm details-this is the most attractive point of zero-knowledge proof, that is, not revealing any specific information about the object Prove the credibility of the information on the premise of the information (zero knowledge). Ideally, Twitter could use ZK-SNARKS to publish proofs of its ranking model — proofs that when the model is applied to specific users and tweets, it produces a specific final output ranking. This proof is the basis for the trustworthiness of the model: users can verify that the calculation of the pattern algorithm performs as promised - or submit it to a third party for audit. This is all done without disclosing the details of the model parameter weights. That is to say, using the officially announced model proof, the user uses the proof to verify that the specific tweet is operating honestly as promised by the model for specific questionable tweets.

1. Core Ideas

As AI evolves at an unimaginable speed, it will inevitably raise concerns about the other "edge" of the AI sword - trust. The first is privacy: in the age of AI, how can humans trust AI from the perspective of privacy? Perhaps the transparency of the AI model is the key to worrying: the ability to emerge like a large-scale language model is tantamount to an impenetrable technological "black box" for humans, and ordinary users cannot understand how the model works and the results of the operation And how to get it (the model itself is full of incomprehensible or predictable capabilities) - what's more troublesome is that as a user, you may not know whether the AI model provided by the service provider is working as promised. Especially when applying AI algorithms and models on some sensitive data, such as medical, financial, Internet applications, etc., whether the AI model is biased (or even maliciously oriented), or whether the service provider runs the model (and related parameters) accurately as promised , has become the most concerned issue for users.

Zero-knowledge proof technology has a targeted solution in this regard, so zero-knowledge machine learning (ZKML) has become the latest development direction. This paper discusses the characteristics of ZKML technology, potential application scenarios and some inspiring cases, and makes a research and elaboration on the development direction of ZKML and its possible industrial impact.

**2. The "other edge" of the AI sword: how to trust AI? **

The capabilities of artificial intelligence are rapidly approaching that of humans, and have already surpassed humans in many niche domains. The recent rapid development of large language models (LLMs) suggests that these models are becoming increasingly intelligent, and these models refine an important interface between algorithms and humans: language. The trend of general artificial intelligence (AGI) is already unstoppable, but judging from the current model training results, AI can perfectly imitate high-capacity humans in digital interactions—and surpass humans at an unimaginable speed in rapid evolution s level. The language model has made significant progress recently. Products represented by ChatGPT have performed amazingly, reaching more than 20% of human ability in most routine evaluations. When comparing GPT-3.5 and GPT-4, which are only a few months apart, making Humans have to marvel at this evolutionary speed. But on the other side is the concern about the loss of control of AI capabilities.

**First is the privacy aspect. **In the AI era, with the development of technologies such as face recognition, users are always worried about the risk of data leakage while experiencing AI services. This has brought certain obstacles to the promotion and development of AI - how to trust AI from the perspective of privacy?

**Perhaps the transparency of AI models is the key to more concern. **The ability to emerge similar to large-scale language models is tantamount to an impenetrable technological "black box" for humans. General users cannot understand how the model operates and how the results are obtained (the model itself is full of Ability that is difficult to understand or predict) - more troublesome, as a user may not know whether the AI model provided by the service provider is working as promised. Especially when applying AI algorithms and models on some sensitive data, such as medical, financial, Internet applications, etc., whether the AI model is biased (or even maliciously oriented), or whether the service provider runs the model (and related parameters) accurately as promised , has become the most concerned issue for users. For example, does the social application platform make relevant recommendations according to the algorithm of "equal treatment"? Is the recommendation from the AI algorithm of the financial service provider as accurate and complete as promised? Is there unnecessary consumption in the medical service plan recommended by AI? Do service providers accept auditing of AI models?

To put it simply, on the one hand, users do not know the real situation of the AI model provided by the service provider. At the same time, they are very worried that the model is not "discriminatory". AI models are considered to include some biased or other oriented factors, which will bring unknown to users loss or negative impact.

On the other hand, the self-evolution speed of AI seems to be more and more unpredictable, and the increasingly powerful AI algorithm model seems to be more and more beyond the possibility of human control,** so the issue of trust has become another "edge" of the sharp sword of AI ". **

It is necessary to establish user trust in AI from the perspectives of data privacy, model transparency, and model controllability. Users need to worry about privacy protection and whether the algorithm model is running accurately and completely as promised; however, this is not an easy task. In terms of model transparency, model providers have concerns about model auditing and supervision based on business secrets and other perspectives; On the other hand, the evolution of the algorithm model itself is not easy to control, and this uncontrollability also needs to be considered.

From the perspective of user data privacy protection, we have also done a lot of research in our previous reports such as "AI and Data Elements Driven by Web3.0: Openness, Security and Privacy". Some applications of Web3.0 are very inspiring in this regard— — That is, AI model training is carried out under the premise of complete user data confirmation and data privacy protection.

However, the current market is overwhelmed by the stunning performance of large models such as Chatgpt, and has not considered the privacy issues of the model itself, the trust issues of the model (and the trust brought about by uncontrollability) brought about by the evolution of the "emergent" characteristics of the algorithm, But on another level, users have always been skeptical about the accurate, complete and honest operation of the so-called algorithmic model. Therefore, the trust issue of AI should be solved from the three levels of users, service providers and model uncontrollability.

3. ZKML: The combination of zero-knowledge proof and AI brings trust

3.1． Zero-knowledge proof: zk-SNARKS, zk-STARK and other technologies are maturing

Zero Knowledge Proof (Zero Knowledge Proof, ZKP) was first proposed by MIT's Shafi Goldwasser and Silvio Micali in a paper titled "Knowledge Complexity of Interactive Proof Systems" in 1985. The author mentioned in the paper that it is possible for a prover to convince a verifier of the authenticity of the data without revealing the specific data. The public function f(x) and the output value y of a function, Alice tells Bob that she knows the value of x, but Bob doesn't believe it. To do this, Alice uses a zero-knowledge proof algorithm to generate a proof. Bob verifies this proof to confirm whether Alice really knows x that satisfies the function f.

For example, using zero-knowledge proof, you don’t need to know Xiaoming’s test scores, but you can know whether his scores meet the user’s requirements—such as whether he is a pass, whether the correct rate of filling in the blanks exceeds 60%, and so on. In the field of AI, combined with zero-knowledge proof, you can have a reliable trust tool for AI models.

Zero-knowledge proof can be interactive, that is, the prover has to prove the authenticity of the data once to each verifier; it can also be non-interactive, that is, the prover creates a proof, and anyone who uses this proof Can be verified.

Zero knowledge is divided into proof and verification. Generally speaking, the proof is quasi-linear, that is, the verification is T*log(T).

Assuming that the verification time is the square of the logarithm of the number of transactions, then the machine verification time for a block of 10,000 transactions is

VTime = ( )2 ~ (13.2)2 ~ 177 ms; now increase the block size by a hundred times (to 1 million tx/block), the new running time of the validator is VTime = (log2 1000000)2 ~ 202 ~ 400 ms . Therefore, we can see its super scalability, which is why, theoretically, tps can reach unlimited.

**Verification is very fast, and all the difficulty lies in the part of generating proofs. **As long as the speed of generating proofs can keep up, then on-chain verification is very simple. There are currently many implementations of zero-knowledge proofs, such as zk-SNARKS, zk-STARKS, PLONK, and Bulletproofs. Each method has its own advantages and disadvantages in terms of proof size, prover time, and verification time.

The more complex and larger the zero-knowledge proof, the higher the performance and the shorter the time required for verification. As shown in the figure below, STARKs and Bulletproofs do not require trusted settings. As the amount of transaction data surges from 1TX to 10,000TX, the size of the latter proof increases even less. The advantage of Bulletproofs is that the size of the proof is a logarithmic transformation (even if f and x are large), it is possible to store the proof in the block, but the computational complexity of its verification is linear. It can be seen that various algorithms have many key points to be weighed, and there is also a lot of room for upgrading. However, in the actual operation process, the difficulty of generating proofs is far greater than imagined. Therefore, the industry is now committed to solving the problem of generating proofs. question.

Although the development of zero-knowledge proof technology is not enough to match the scale of a large language model (LLM), its technical implementation has inspiring application scenarios. Especially in the development of AI double-edged sword, zero-knowledge proof provides a reliable solution for AI trust.

3.2． Zero Knowledge Machine Learning (ZKML): Trustless AI

In an era when AI-generated content is getting closer and closer to human-generated content, the technical characteristics of zero-knowledge secret proofs can help us determine that specific content is generated by applying a specific model. For privacy protection, zero-knowledge proof technology is particularly important, that is, the proof and verification can be completed without revealing the user data input or the specific details of the model. Considering the integrity of computing, heuristic optimization, and privacy, the combination of zero-knowledge proof and AI, zero-knowledge machine learning (Zero-Knowledge Machine Learning, ZKML) came into being.

Here are five ways zero-knowledge proofs can be applied to machine learning. In addition to basic functions such as computational integrity, model integrity, and user privacy, zero-knowledge machine learning can also bring about distributed training-this will promote the integration of AI and blockchain, and the identification of people in the AI jungle (This part can be found in our report "OpenAI Founder's Web3 Vision: Worldcoin Creates AI Digital Passport").

The demand for computing power of the AI large model is obvious to all. At this time, by interspersing ZK proofs into AI applications, new demands are placed on hardware computing power. The current state of the art for zero-knowledge systems combined with high-performance hardware still cannot prove anything as large as the large language models (LLMs) currently available, but some progress has been made in creating proofs for smaller models. According to the Modulus Labs team, the existing ZK proof system was tested against various models of different sizes. Proof systems such as plonky2 can run in about 50 seconds on a powerful AWS machine to create proofs for models with a scale of about 18 million parameters.

In terms of hardware, the current hardware options for ZK technology include GPU, FPGA or ASIC. It should be noted that zero-knowledge proof is still in the early stage of development, there is still little standardization, and the algorithm is constantly being updated and changed. Each algorithm has its own characteristics and is suitable for different hardware, and each algorithm will be improved to a certain extent as the development of the project requires, so it is difficult to specifically evaluate which algorithm is the best.

It should be noted that in terms of the combination of ZK and AI large models, there is no clear research on the evaluation of existing hardware systems. Therefore, there are still large variables and potentials in terms of future hardware requirements.

3.3． Inspiring Case: Validating the Twitter Recommendation Ranking Algorithm

Twitter's "For You" recommendation function uses an AI recommendation algorithm to refine the approximately 500 million tweets posted every day into a handful of popular tweets, which are finally displayed on the "For You" timeline on the user's homepage. The recommendation extracts latent information from tweet, user, and engagement data to be able to provide more relevant recommendations. At the end of March 2023, Twitter open-sourced the algorithm that selects and ranks posts on the timeline for the recommendation feature "For You". The recommendation process is roughly as follows:

Generate user behavior features from the interaction between users and the website, and obtain the best tweets from different recommendation sources;
Use AI algorithm model to rank each tweet;
Apply heuristics and filters, such as filtering out tweets that users have blocked and tweets they have seen, etc.

The core module of the recommendation algorithm is the service responsible for building and providing the For You timeline - Home Mixer. The service acts as an algorithmic backbone connecting different candidate sources, scoring functions, heuristics, and filters.

The "For You" recommendation function predicts and scores the relevance of each candidate tweet based on approximately 1,500 potentially relevant candidate recommendations. Twitter's official website says that at this stage, all candidate tweets are treated equally. The core ranking is achieved through a neural network of about 48 million parameters, which is continuously trained on tweet interactions to optimize. This ranking mechanism considers thousands of features and outputs ten or so labels to score each tweet, where each label represents the probability of engagement, and then ranks tweets based on these scores.

Although this is an important step towards transparency in Twitter’s recommendation algorithm, users still cannot verify whether the algorithm is running accurately and completely—one of the main reasons is the specific weight details in the algorithm model used to rank tweets to protect user privacy The reason was not made public. Therefore, the transparency of the algorithm is still in doubt.

Utilizing ZKML (Zero-Knowledge Machine Learning) technology, Twitter can prove whether the algorithm model weight details are accurate and complete (whether the model and its parameters are "equal to different users"), which makes the algorithm model privacy protection and There is a nice balance between transparency.

Community developer Daniel Kang and others use cryptographic tools ZK-SNARKs to check whether the Twitter recommendation algorithm is correct and running completely without disclosing algorithm details-this is the most attractive point of zero-knowledge proof, that is, not revealing any specific information about the object Prove the credibility of the information on the premise of the information (zero knowledge). Ideally, Twitter could use ZK-SNARKS to publish proofs of its ranking model — proofs that when the model is applied to specific users and tweets, it produces a specific final output ranking. This proof is the basis for the trustworthiness of the model: users can verify that the calculation of the pattern algorithm performs as promised - or submit it to a third party for audit. This is all done without disclosing the details of the model parameter weights. That is to say, using the officially announced model proof, the user uses the proof to verify that the specific tweet is operating honestly as promised by the model for specific questionable tweets.

Let’s say a user thinks the timeline of the “For You” recommendation feature is questionable—thinking that certain tweets should be ranked higher (or lower). If Twitter can launch the ZKML proof function, users can use the official proof to check how the suspected tweet ranks compared to other tweets in the timeline (the calculated score corresponds to the ranking). Scores that don't match indicate that the algorithmic model for those particular tweets wasn't operating honestly (it was artificially varied in some parameters). It can be understood that although the official does not announce the specific details of the model, but according to the model, a magic wand (the proof generated by the model) is given. Any tweet using this magic wand can display the relevant ranking score - and according to this magic wand However, the private details of the model cannot be restored. Therefore, the details of the official model are audited while the privacy of the details is preserved.

From the perspective of the model, while protecting the privacy of the model, the use of ZKML technology can still make the model gain audit and user trust.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Share

Comment

0/400

No comments

Topic
Gate Hits 30 Million Users
57k Popularity
Trump–Musk Rift
31k Popularity
Solana Staking ETF
20k Popularity
4Trump’s Tax Reform
31k Popularity
5Grayscale ETF Approval
18k Popularity
6Gate Ranks Top 4 Exchange
7k Popularity
7Gate Square Creator Spark Program
144k Popularity
8Kevin Lee Joins Gate Square
68k Popularity
9Content Mining & Earn Rich Commission
1773k Popularity

sitemap