BIP 158 Dense Block Filter Explained

2023-06-30 02:04:50

By Elle Mouton

In this article, I briefly describe the need for light clients in Bitcoin and why "compact block filters" serve this need better than "Bloom filters". I then explain in depth how the dense block filter works, with a step-by-step walkthrough of building such a filter on a testnet.

Meaning of block filter

A bitcoin light client (bitcoin light software) is software whose goal is not to store the bitcoin blockchain but to support a bitcoin wallet. This means that it needs to be able to broadcast transactions to the network, but most importantly, it must be able to find new transactions from the chain related to this wallet. There are two types of such related transactions: sending funds to this wallet (creating a new output for an address of this wallet), and spending one of the UTXOs controlled by this wallet.

What's wrong with bloom filters?

Before BIP 158, the most common approach for light clients was the Bloom filter described in BIP 371. The pattern of a Bloom filter is that you take all the data objects you are interested in (eg, the script public keys of the UTXOs that were spent and created), hash them one by one multiple times, and add each result to the In a bitmap called a "bloom filter". This filter represents what you are interested in. Then, you send this filter to a Bitcoin node that you trust and ask them to send you everything that matches this filter.

The problem is that this process is not very private, because you still leak some information to the Bitcoin node that accepts the filter. They can know deals you're interested in and deals you're not interested in at all; they can also not send you things that match the filters. So, it's not ideal for light clients. But at the same time, it is not ideal for Bitcoin nodes serving light clients. Every time you send a filter, they have to load the relevant block from disk and then determine if any transactions match your filter. Just keep sending fake filters and you can bomb them - essentially a DOS attack. It takes very little energy to create a filter, but it takes a lot of energy to respond to such a request.

Dense block filter

OK, then the attributes we need are:

Stronger privacy
Client-server load asymmetry is less. That is, there should be less work to be done on the server side.
Less trust. Light clients don't need to worry about the server not returning related transactions.

Using a dense block filter, the server (full node) will construct a deterministic filter for each block, including all data objects in this block. This filter only needs to be calculated once and stored forever. If a light client requests a filter for a certain block, there is no asymmetry here - because the server does not need to do any more work than the requesting client. The light client can also choose multiple information sources to download the complete block to ensure that they are consistent; moreover, the light client can always download the complete block to check whether the filter provided by the server is correct. Correct (matches the relevant block content). Another benefit is that it is more private this way. The light client no longer needs to send the fingerprint of the data it wants to the server. It also becomes more difficult to analyze the activity of a light client. After the light client obtains such a filter from the server, it checks whether the filter contains the data it wants, and if so, the light client requests the complete block. It should also be pointed out that in order to serve light clients in this way, full nodes need to persist these filters, and light clients may also want to persist a few filters, so keep the filters as light as possible Quantity is very important (this is why it is called "dense block filter").

very good! Now we're getting into the rare stuff. How is such a filter created? What does it look like?

What are our needs?

We want to put the fingerprint of a specific object in the filter, so that when clients want to check whether a block may contain something related to themselves, they can enumerate all objects and check a filter Whether to match these objects.
We want this filter to be as small as possible.
In essence, what we want is something that can summarize some information of a block with a volume much smaller than the block.

The basic information contained in the filter is: the script public key of each input of each transaction (that is, the script public key of each spent UTXO), and the script public key of each output of each transaction. Basically it's like this:

objects = {spk1, spk2, spk3, spk4, ..., spkN} // a list of N script public keys

Technically, we can stop here - just such a list of script public keys can also act as our filter. This is a condensed version of a block that contains information needed by light clients. With such a list, a light client can 100% confirm whether there is a transaction of interest in a block. But such a list is very large. Therefore, our next steps are all to make this list as compact as possible. This is where things get interesting.

First, we can convert each object into a number within a certain range, and make each object number evenly distributed within this range. Suppose we have 10 objects (N = 10), and then we have some kind of function that converts each object into a number. Let's say we choose the range [0, 10] (since we have 10 objects). Now, our "hash to number" function will take each object as input and produce a number with size [0, 10] respectively. The numbers are evenly distributed across this range. This means that, after sorting, we will end up with a graph like this (in a very, very ideal case):

First of all, this is great because we've drastically reduced the size of each object's fingerprint. Now each object is just a number. So, our new filter looks like this:

numbers := {1,2,3,4,5,6,7,8,9,10}

A light client downloads such a filter, hoping to know whether an object it is looking for matches the filter. They just take the object they're interested in and run the same "hash to number" function to see if the result is in this filter. Where is the problem? The problem is that the numbers in the filter already take up every possibility in that space! This means that no matter what objects the light client cares about, the resulting numeric result must match this filter. In other words, this filter has a "false-positive rate" of 1 (Translator's Note: The "false positive" here means that the block may not contain The result obtained by the filter is yes). This is not so good. It also shows that in the process of compressing the data to generate the filter, we lose too much information. We need a higher false positive rate. Then, suppose we want the false positive rate to be 5. Then, we can make the objects in the block evenly match to numbers in the range [0, 50]:

It looks better that way. If I am a client and download such a filter, I need to check whether an object I care about is in the corresponding block through the filter, and the probability of the filter saying yes but not in the block (false positive) will decrease to 1/5. Great, now we're mapping 10 objects to numbers between 0 and 50. This new list of numbers is our filter. Again, we can stop at any time, however, we can compress them even further!

We now have an ordered list of numbers evenly distributed in the interval [0, 50]. We know that there are 10 numbers in this list. This means, we can deduce that the difference between two adjacent numbers in this ordered list of numbers is most likely to be 5. In general, if we have N objects and want a false positive rate of M, then the size of the space should be N * M . That is, the size of these numbers should be between 0 and N * M, but (after sorting) the interpolation probability of two adjacent numbers checked is M. And M must be stored smaller than the number in N * M space. So, instead of storing every single number, we can store the difference between two adjacent numbers. In the above example, this means that instead of storing [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50] as a filter, we store [0, 5, 5, 5, 5, 5, 5, 5, 5, 5], from which it is trivial to reconstruct the initial list. If you scale down, storing the number 50 obviously takes more space than storing the number 5. But why stop there? Let's compress a little more!

The next step is to use "Golomb-Rice encoding". This encoding method can well encode a list where each number in the table is very close to a certain number. And our list happens to be just that! Each of the numbers in our list is likely to be close to 5 (or close to our false positive rate, which is M), so take the quotient of each number by M (divide each number by 5 and ignore drop the remainder), will most likely be 0 (if the number is slightly less than 5) or 1 (if the number is slightly greater than 5). It is also possible that the quotient is 2, 3, etc., but the probability is much smaller. great! So we can take advantage of this: we'll use as few bits as possible to encode smaller quotients, while using more bitcoins to encode larger, but less likely, quotients. Then, we also need to encode the remainder (since we want to be able to reconstruct the entire list exactly), which will always be in the range [0, M - 1] (in this case [0, 4]). For the encoder, we use the following mappings:

The above mapping is easy to understand: the number of digit 1s indicates the quotient we want to encode, and 0 indicates the end of the quotient encoding. So, for each number in our list, we can use the above table to encode the quotient, and then convert the remainder to binary using a bittruck that can represent a maximum value of M-1. Here the remainder is 4, and only 3 bits are needed. Here is a table showing the possible remainders and their encodings for our example:

So, ideally, our list [0, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5] would be encoded as:

0000 10000 10000 10000 10000 10000 10000 10000 10000 10000

Before we move to a more realistic case, let's see if we can recover our original list with this filter.

Now what we have is "0000100001000010000100001000010000100001000010000". We know the Golomb-Rice code, and we know that M is 5 (since this will be public knowledge, known to everyone using this filter construction). Since we know that M is 5, we know that there will be 3 bits to encode the remainder. So, we can convert this filter into the following "quotient-remainder" array list:

[(0, 0), (1, 0), (1, 0), (1, 0), (1, 0), (1, 0), (1, 0), (1, 0), (1, 0), (1, 0)]

Knowing that the quotient is obtained by dividing by M(5), we can recover:

[0, 5, 5, 5, 5, 5, 5, 5, 5, 5]

Also, we know that this list represents the difference between two adjacent numbers, so we can restore the original list:

[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50]

A more realistic example

We are now going to try to construct a filter for a real Bitcoin testnet block. I will use block 2101914. Let's see what its filter looks like:

$ bitcoin-cli getblockhash 2101914 000000000000002c06f9afaf2b2b066d4f814ff60cfbc4df55840975a00e035c $ bitcoin-cli getblockfilter 000000000000002c06f9afaf2b2b066d4f814ff60cfbc4df55840975a00e035c { "filter": "5571d126b85aa79c9de56d55995aa292de0484b830680a735793a8c2260113148421279906f800c3b8c94ff37681fb1fd230482518c52df57437864023833f2f801639692646ddcd7976ae4f2e2a1ef58c79b3aed6a705415255e362581692831374a5e5e70d5501cdc0a52095206a15cd2eb98ac980c22466e6945a65a5b0b0c5b32aa1e0cda2545da2c4345e049b614fcad80b9dc9c903788163822f4361bbb8755b79c276b1cf7952148de1e5ee0a92f6d70c4f522aa6877558f62b34b56ade12fa2e61023abf3e570937bf379722bc1b0dc06ffa1c5835bb651b9346a270", "header": "8d0cd8353342930209ac7027be150e679bbc7c65cc62bb8392968e43a6ea3bfe" }

Alright dear, let's see how we can construct such a filter from blocks.

The full code used here can be found in this github repository. I'll just show some pseudocode snippets here. The core of this code is a function called constructFilter, which can be used by bitcoin clients to call bitcoind and the corresponding blocks. The function looks like this:

func constructFilter(bc *bitcoind.Bitcoind, block bitcoind.Block) ([]byte, error) { // 1 collects all objects from the block that we want to add to a filter // 2 Convert these objects to numbers and sort them // 3 Get the difference between two adjacent numbers in the sorted number list // 4 Use Golomb-Rice encoding to encode these differences }

So, the first step is to collect all the objects from the block that we want to add to the filter. Starting from BIP, we know that these objects include all spent script public keys, and every output script public key. The BIP also sets some additional rules: we skip inputs to the coinbase transaction (which doesn't make sense since it has no inputs), and we pass all OP_RETURN outputs. We also deduplicate data. If there are two identical script public keys, we only add it once to the filter.

// list of objects we wish to add to the filter // Include all spent script public keys, and every output script public key // We use a "map", which removes duplicate script public keys objects := make(map [string] struct{}) // facilitate every transaction in the block for i, tx := range block.Tx { // Add script public key for each output // into our list of objects for _, txOut := range tx.Vout { PubKey := txOut.PubKey if len(PubKey) == 0 { continue } // Skip if OP_RETURN (0x6a) is output if spk [0] == 0x6a { continue } objects [skpStr] = struct{}{} } // do not add the input of the coinbase transaction if i == 0 { continue } // For each input, get its script public key for _, txIn := range tx.Vin { prevTx, err := bc.GetRawTransaction(txIn.Txid) if err != nil { return nil, err } PubKey := prevTx.Vout[txIn.Vout].PubKey if len(PubKey) == 0 { continue } objects [spkStr] = struct{}{} } }

OK, now all the objects are collected. Now, we define variable N to be the length of the object graph. Here, N is 85 .

The next step is to convert our object into numbers that are uniformly distributed over an interval. Again, the range depends on how high a false positive rate you want. BIP158 defines the constant M as 784931. This means that the probability of a false positive is 1 in 784931. Just like we did before, we take the false positive rate M times N and get the range of distribution of all numbers. We define this range as F, that is, F = M * N . Here, F = 66719135. I'm not going to go into the details of the functions that map objects to numbers (you can find the details in the github repository linked above). All you need to know is that it takes as input the object, the constant F (which defines the scope to which the object needs to be mapped), and a key (the block hash). Once we have all the numbers, we sort the list in ascending order and then create a new list called differences that holds the difference between two adjacent numbers in the sorted list of numbers.

numbers := make([]uint64, 0, N) // Iterate through all objects and convert them to numbers evenly distributed in the range [0, F] // and store these numbers as numbers list for o := range objects { // Using the given key, max number (F) and object bytes (o), // convert the object to a number between 0 and F. v := convertToNumber(b, F, key) numbers = append(numbers, v) } // sort the numbers list sort.Slice(numbers, func(i, j int) bool { return numbers [i] < numbers [j] }) // Convert the list of numbers into a list of differences differences := make([]uint64, N) for i, num := range numbers { if i == 0 { differences [i] = num continue } differences [i] = num - numbers[i-1] }

great! Here's a picture showing a list of numbers and differences.

As you zoom out, the 85 numbers are evenly distributed throughout the space! So the values in the differences list are also very small.

The final step is to encode the list of differences using Golomb-Rice encoding. Recall from the previous explanation that we need to divide each difference by the most likely value and then encode the quotient and remainder. In our previous example, I said that the most likely value is M, and the remainder will be between [0, M]. However, BIP158 does not do this, as we found 2, which is not the parameter that actually minimizes the final filter size. So instead of using M, we define a new constant P and use 2^P as the Golomb-Rice parameter. P is defined as 19 . This means we divide each difference by 2^19 to get the quotient and remainder, and the remainder is encoded in binary with 19 bits.

filter := bstream.NewBStreamWriter(0) // For each value in the differences list, obtain the quotient and remainder by dividing by 2^P. for _, d := range differences { q := math.Floor(float64(d)/math.Exp2(float64(P))) r := d - uint64(math.Exp2(float64(P))*q) // Encoder for i := 0; i < int(q); i++ { filter.WriteBit(true) } filter.WriteBit(false) filter.WriteBits(r, P) }

great! Now we can output the filter, getting:

71d126b85aa79c9de56d55995aa292de0484b830680a735793a8c2260113148421279906f800c3b8c94ff37681fb1fd230482518c52df57437864023833f2f 801639692646ddcd7976ae4f2e2a1ef58c79b3aed6a705415255e362581692831374a5e5e70d5501cdc0a52095206a15cd2eb98ac980c22466e6945a65a5b0b0c 5b32aa1e0cda2545da2c4345e049b614fcad80b9dc9c903788163822f4361bbb8755b79c276b1cf7952148de1e5ee0a92f6d70c4f522aa6877558f62b34b56ade12fa2e 61023abf3e570937bf379722bc1b0dc06ffa1c5835bb651b9346a270

Exactly the same filter as we got in bitcoind except for the first two bytes! Why is there a difference in the first 2 bytes? Because BIP says that the value of N needs to be encoded in CompactSize format and appear in front of the filter so that it can be decoded by the receiver. This is done by:

fd := filter.Bytes() buffer bytes.Buffer buffer.Grow(wire.IntSerializeSize(uint64(N)) + len(fd)) err = wire.WriteInt(&buffer, 0, uint64(N)) if err != nil { return nil, err } _, err = buffer.Write(fd) if err != nil { return nil, err }

If we now print out the filter again, we will find that it is exactly the same as bitcoind's result:

5571d126b85aa79c9de56d55995aa292de0484b830680a735793a8c2260113148421279906f800c3b8c94ff37681fb1fd230482518c52df57437864023833f 2f801639692646ddcd7976ae4f2e2a1ef58c79b3aed6a705415255e362581692831374a5e5e70d5501cdc0a52095206a15cd2eb98ac980c22466e6945a65a5b0b 0c5b32aa1e0cda2545da2c4345e049b614fcad80b9dc9c903788163822f4361bbb8755b79c276b1cf7952148de1e5ee0a92f6d70c4f522aa6877558f62b34b56ade12fa 2e61023abf3e570937bf379722bc1b0dc06ffa1c5835bb651b9346a270

success!

However, my personal understanding is that there is no need to add N to the filter. If you know the value of P, then you can find the value of N. Let's see if we can restore the original list of numbers using the output of the first filter:

b := bstream.NewBStreamReader(filter) ( numbers []uint64 prevNum uint64 ) for { // Read the quotient from the byte string. The first 0 encountered indicates the end of the quotient // Before encountering 0, the number of 1 represents the quotient q uint64 c, err := b.ReadBit() if err != nil { return err } for c { q++ c, err = b.ReadBit() if errors.Is(err, io.EOF) { break } else if err != nil { return err } } // The next P bits encode the remainder r, err := b.ReadBits(P) if errors.Is(err, io.EOF) { break } else if err != nil { return err } n := q*uint64(math.Exp2(float64(P))) + r num := n + prevNum numbers = append(numbers, num) prevNum = num } fmt.Println(numbers)

The above operation produces the same list of numbers, so we can reconstruct it without knowing N. So I'm not sure what the rationale for adding N to the filter is. If you know why N must be included, please let me know!

Thanks for reading!

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Share

Comment

0/400

No comments

Topic
Dr.Han Joins Gate Square
231 Popularity
Gate VIP New Upgrade
3k Popularity
Nonfarm Payrolls Update
4k Popularity
4Gate Alpha Trading Share
8k Popularity
5Gate xStocks Trading Open
23k Popularity
6BTC Back Above 110K
2k Popularity
7Trump–Musk Rift
32k Popularity
8Solana Staking ETF
20k Popularity
9Trump’s Tax Reform
32k Popularity
10Gate Square Creator Spark Program
129k Popularity

sitemap