🎉 Gate xStocks Trading is Now Live! Spot, Futures, and Alpha Zone – All Open!
📝 Share your trading experience or screenshots on Gate Square to unlock $1,000 rewards!
🎁 5 top Square creators * $100 Futures Voucher
🎉 Share your post on X – Top 10 posts by views * extra $50
How to Participate:
1️⃣ Follow Gate_Square
2️⃣ Make an original post (at least 20 words) with #Gate xStocks Trading Share#
3️⃣ If you share on Twitter, submit post link here: https://www.gate.com/questionnaire/6854
Note: You may submit the form multiple times. More posts, higher chances to win!
📅 July 3, 7:00 – July 9,
BIP 158 Dense Block Filter Explained
By Elle Mouton
In this article, I briefly describe the need for light clients in Bitcoin and why "compact block filters" serve this need better than "Bloom filters". I then explain in depth how the dense block filter works, with a step-by-step walkthrough of building such a filter on a testnet.
Meaning of block filter
A bitcoin light client (bitcoin light software) is software whose goal is not to store the bitcoin blockchain but to support a bitcoin wallet. This means that it needs to be able to broadcast transactions to the network, but most importantly, it must be able to find new transactions from the chain related to this wallet. There are two types of such related transactions: sending funds to this wallet (creating a new output for an address of this wallet), and spending one of the UTXOs controlled by this wallet.
What's wrong with bloom filters?
Before BIP 158, the most common approach for light clients was the Bloom filter described in BIP 371. The pattern of a Bloom filter is that you take all the data objects you are interested in (eg, the script public keys of the UTXOs that were spent and created), hash them one by one multiple times, and add each result to the In a bitmap called a "bloom filter". This filter represents what you are interested in. Then, you send this filter to a Bitcoin node that you trust and ask them to send you everything that matches this filter.
The problem is that this process is not very private, because you still leak some information to the Bitcoin node that accepts the filter. They can know deals you're interested in and deals you're not interested in at all; they can also not send you things that match the filters. So, it's not ideal for light clients. But at the same time, it is not ideal for Bitcoin nodes serving light clients. Every time you send a filter, they have to load the relevant block from disk and then determine if any transactions match your filter. Just keep sending fake filters and you can bomb them - essentially a DOS attack. It takes very little energy to create a filter, but it takes a lot of energy to respond to such a request.
Dense block filter
OK, then the attributes we need are:
Using a dense block filter, the server (full node) will construct a deterministic filter for each block, including all data objects in this block. This filter only needs to be calculated once and stored forever. If a light client requests a filter for a certain block, there is no asymmetry here - because the server does not need to do any more work than the requesting client. The light client can also choose multiple information sources to download the complete block to ensure that they are consistent; moreover, the light client can always download the complete block to check whether the filter provided by the server is correct. Correct (matches the relevant block content). Another benefit is that it is more private this way. The light client no longer needs to send the fingerprint of the data it wants to the server. It also becomes more difficult to analyze the activity of a light client. After the light client obtains such a filter from the server, it checks whether the filter contains the data it wants, and if so, the light client requests the complete block. It should also be pointed out that in order to serve light clients in this way, full nodes need to persist these filters, and light clients may also want to persist a few filters, so keep the filters as light as possible Quantity is very important (this is why it is called "dense block filter").
very good! Now we're getting into the rare stuff. How is such a filter created? What does it look like?
What are our needs?
The basic information contained in the filter is: the script public key of each input of each transaction (that is, the script public key of each spent UTXO), and the script public key of each output of each transaction. Basically it's like this:
objects = {spk1, spk2, spk3, spk4, ..., spkN} // a list of N script public keys
Technically, we can stop here - just such a list of script public keys can also act as our filter. This is a condensed version of a block that contains information needed by light clients. With such a list, a light client can 100% confirm whether there is a transaction of interest in a block. But such a list is very large. Therefore, our next steps are all to make this list as compact as possible. This is where things get interesting.
First, we can convert each object into a number within a certain range, and make each object number evenly distributed within this range. Suppose we have 10 objects (N = 10), and then we have some kind of function that converts each object into a number. Let's say we choose the range [0, 10] (since we have 10 objects). Now, our "hash to number" function will take each object as input and produce a number with size [0, 10] respectively. The numbers are evenly distributed across this range. This means that, after sorting, we will end up with a graph like this (in a very, very ideal case):
numbers := {1,2,3,4,5,6,7,8,9,10}
A light client downloads such a filter, hoping to know whether an object it is looking for matches the filter. They just take the object they're interested in and run the same "hash to number" function to see if the result is in this filter. Where is the problem? The problem is that the numbers in the filter already take up every possibility in that space! This means that no matter what objects the light client cares about, the resulting numeric result must match this filter. In other words, this filter has a "false-positive rate" of 1 (Translator's Note: The "false positive" here means that the block may not contain The result obtained by the filter is yes). This is not so good. It also shows that in the process of compressing the data to generate the filter, we lose too much information. We need a higher false positive rate. Then, suppose we want the false positive rate to be 5. Then, we can make the objects in the block evenly match to numbers in the range [0, 50]:
We now have an ordered list of numbers evenly distributed in the interval [0, 50]. We know that there are 10 numbers in this list. This means, we can deduce that the difference between two adjacent numbers in this ordered list of numbers is most likely to be 5. In general, if we have N objects and want a false positive rate of M, then the size of the space should be N * M . That is, the size of these numbers should be between 0 and N * M, but (after sorting) the interpolation probability of two adjacent numbers checked is M. And M must be stored smaller than the number in N * M space. So, instead of storing every single number, we can store the difference between two adjacent numbers. In the above example, this means that instead of storing [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50] as a filter, we store [0, 5, 5, 5, 5, 5, 5, 5, 5, 5], from which it is trivial to reconstruct the initial list. If you scale down, storing the number 50 obviously takes more space than storing the number 5. But why stop there? Let's compress a little more!
The next step is to use "Golomb-Rice encoding". This encoding method can well encode a list where each number in the table is very close to a certain number. And our list happens to be just that! Each of the numbers in our list is likely to be close to 5 (or close to our false positive rate, which is M), so take the quotient of each number by M (divide each number by 5 and ignore drop the remainder), will most likely be 0 (if the number is slightly less than 5) or 1 (if the number is slightly greater than 5). It is also possible that the quotient is 2, 3, etc., but the probability is much smaller. great! So we can take advantage of this: we'll use as few bits as possible to encode smaller quotients, while using more bitcoins to encode larger, but less likely, quotients. Then, we also need to encode the remainder (since we want to be able to reconstruct the entire list exactly), which will always be in the range [0, M - 1] (in this case [0, 4]). For the encoder, we use the following mappings:
0000 10000 10000 10000 10000 10000 10000 10000 10000 10000
Before we move to a more realistic case, let's see if we can recover our original list with this filter.
Now what we have is "0000100001000010000100001000010000100001000010000". We know the Golomb-Rice code, and we know that M is 5 (since this will be public knowledge, known to everyone using this filter construction). Since we know that M is 5, we know that there will be 3 bits to encode the remainder. So, we can convert this filter into the following "quotient-remainder" array list:
[(0, 0), (1, 0), (1, 0), (1, 0), (1, 0), (1, 0), (1, 0), (1, 0), (1, 0), (1, 0)]
Knowing that the quotient is obtained by dividing by M(5), we can recover:
[0, 5, 5, 5, 5, 5, 5, 5, 5, 5]
Also, we know that this list represents the difference between two adjacent numbers, so we can restore the original list:
[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50]
A more realistic example
We are now going to try to construct a filter for a real Bitcoin testnet block. I will use block 2101914. Let's see what its filter looks like:
$ bitcoin-cli getblockhash 2101914 000000000000002c06f9afaf2b2b066d4f814ff60cfbc4df55840975a00e035c $ bitcoin-cli getblockfilter 000000000000002c06f9afaf2b2b066d4f814ff60cfbc4df55840975a00e035c { "filter": "5571d126b85aa79c9de56d55995aa292de0484b830680a735793a8c2260113148421279906f800c3b8c94ff37681fb1fd230482518c52df57437864023833f2f801639692646ddcd7976ae4f2e2a1ef58c79b3aed6a705415255e362581692831374a5e5e70d5501cdc0a52095206a15cd2eb98ac980c22466e6945a65a5b0b0c5b32aa1e0cda2545da2c4345e049b614fcad80b9dc9c903788163822f4361bbb8755b79c276b1cf7952148de1e5ee0a92f6d70c4f522aa6877558f62b34b56ade12fa2e61023abf3e570937bf379722bc1b0dc06ffa1c5835bb651b9346a270", "header": "8d0cd8353342930209ac7027be150e679bbc7c65cc62bb8392968e43a6ea3bfe" }
Alright dear, let's see how we can construct such a filter from blocks.
The full code used here can be found in this github repository. I'll just show some pseudocode snippets here. The core of this code is a function called constructFilter, which can be used by bitcoin clients to call bitcoind and the corresponding blocks. The function looks like this:
func constructFilter(bc *bitcoind.Bitcoind, block bitcoind.Block) ([]byte, error) { // 1 collects all objects from the block that we want to add to a filter // 2 Convert these objects to numbers and sort them // 3 Get the difference between two adjacent numbers in the sorted number list // 4 Use Golomb-Rice encoding to encode these differences }
So, the first step is to collect all the objects from the block that we want to add to the filter. Starting from BIP, we know that these objects include all spent script public keys, and every output script public key. The BIP also sets some additional rules: we skip inputs to the coinbase transaction (which doesn't make sense since it has no inputs), and we pass all OP_RETURN outputs. We also deduplicate data. If there are two identical script public keys, we only add it once to the filter.
// list of objects we wish to add to the filter // Include all spent script public keys, and every output script public key // We use a "map", which removes duplicate script public keys objects := make(map [string] struct{}) // facilitate every transaction in the block for i, tx := range block.Tx { // Add script public key for each output // into our list of objects for _, txOut := range tx.Vout { PubKey := txOut.PubKey if len(PubKey) == 0 { continue } // Skip if OP_RETURN (0x6a) is output if spk [0] == 0x6a { continue } objects [skpStr] = struct{}{} } // do not add the input of the coinbase transaction if i == 0 { continue } // For each input, get its script public key for _, txIn := range tx.Vin { prevTx, err := bc.GetRawTransaction(txIn.Txid) if err != nil { return nil, err } PubKey := prevTx.Vout[txIn.Vout].PubKey if len(PubKey) == 0 { continue } objects [spkStr] = struct{}{} } }
OK, now all the objects are collected. Now, we define variable N to be the length of the object graph. Here, N is 85 .
The next step is to convert our object into numbers that are uniformly distributed over an interval. Again, the range depends on how high a false positive rate you want. BIP158 defines the constant M as 784931. This means that the probability of a false positive is 1 in 784931. Just like we did before, we take the false positive rate M times N and get the range of distribution of all numbers. We define this range as F, that is, F = M * N . Here, F = 66719135. I'm not going to go into the details of the functions that map objects to numbers (you can find the details in the github repository linked above). All you need to know is that it takes as input the object, the constant F (which defines the scope to which the object needs to be mapped), and a key (the block hash). Once we have all the numbers, we sort the list in ascending order and then create a new list called differences that holds the difference between two adjacent numbers in the sorted list of numbers.
numbers := make([]uint64, 0, N) // Iterate through all objects and convert them to numbers evenly distributed in the range [0, F] // and store these numbers as numbers list for o := range objects { // Using the given key, max number (F) and object bytes (o), // convert the object to a number between 0 and F. v := convertToNumber(b, F, key) numbers = append(numbers, v) } // sort the numbers list sort.Slice(numbers, func(i, j int) bool { return numbers [i] < numbers [j] }) // Convert the list of numbers into a list of differences differences := make([]uint64, N) for i, num := range numbers { if i == 0 { differences [i] = num continue } differences [i] = num - numbers[i-1] }
great! Here's a picture showing a list of numbers and differences.
The final step is to encode the list of differences using Golomb-Rice encoding. Recall from the previous explanation that we need to divide each difference by the most likely value and then encode the quotient and remainder. In our previous example, I said that the most likely value is M, and the remainder will be between [0, M]. However, BIP158 does not do this, as we found 2, which is not the parameter that actually minimizes the final filter size. So instead of using M, we define a new constant P and use 2^P as the Golomb-Rice parameter. P is defined as 19 . This means we divide each difference by 2^19 to get the quotient and remainder, and the remainder is encoded in binary with 19 bits.
filter := bstream.NewBStreamWriter(0) // For each value in the differences list, obtain the quotient and remainder by dividing by 2^P. for _, d := range differences { q := math.Floor(float64(d)/math.Exp2(float64(P))) r := d - uint64(math.Exp2(float64(P))*q) // Encoder for i := 0; i < int(q); i++ { filter.WriteBit(true) } filter.WriteBit(false) filter.WriteBits(r, P) }
great! Now we can output the filter, getting:
71d126b85aa79c9de56d55995aa292de0484b830680a735793a8c2260113148421279906f800c3b8c94ff37681fb1fd230482518c52df57437864023833f2f 801639692646ddcd7976ae4f2e2a1ef58c79b3aed6a705415255e362581692831374a5e5e70d5501cdc0a52095206a15cd2eb98ac980c22466e6945a65a5b0b0c 5b32aa1e0cda2545da2c4345e049b614fcad80b9dc9c903788163822f4361bbb8755b79c276b1cf7952148de1e5ee0a92f6d70c4f522aa6877558f62b34b56ade12fa2e 61023abf3e570937bf379722bc1b0dc06ffa1c5835bb651b9346a270
Exactly the same filter as we got in bitcoind except for the first two bytes! Why is there a difference in the first 2 bytes? Because BIP says that the value of N needs to be encoded in CompactSize format and appear in front of the filter so that it can be decoded by the receiver. This is done by:
fd := filter.Bytes() buffer bytes.Buffer buffer.Grow(wire.IntSerializeSize(uint64(N)) + len(fd)) err = wire.WriteInt(&buffer, 0, uint64(N)) if err != nil { return nil, err } _, err = buffer.Write(fd) if err != nil { return nil, err }
If we now print out the filter again, we will find that it is exactly the same as bitcoind's result:
5571d126b85aa79c9de56d55995aa292de0484b830680a735793a8c2260113148421279906f800c3b8c94ff37681fb1fd230482518c52df57437864023833f 2f801639692646ddcd7976ae4f2e2a1ef58c79b3aed6a705415255e362581692831374a5e5e70d5501cdc0a52095206a15cd2eb98ac980c22466e6945a65a5b0b 0c5b32aa1e0cda2545da2c4345e049b614fcad80b9dc9c903788163822f4361bbb8755b79c276b1cf7952148de1e5ee0a92f6d70c4f522aa6877558f62b34b56ade12fa 2e61023abf3e570937bf379722bc1b0dc06ffa1c5835bb651b9346a270
success!
However, my personal understanding is that there is no need to add N to the filter. If you know the value of P, then you can find the value of N. Let's see if we can restore the original list of numbers using the output of the first filter:
b := bstream.NewBStreamReader(filter) ( numbers []uint64 prevNum uint64 ) for { // Read the quotient from the byte string. The first 0 encountered indicates the end of the quotient // Before encountering 0, the number of 1 represents the quotient q uint64 c, err := b.ReadBit() if err != nil { return err } for c { q++ c, err = b.ReadBit() if errors.Is(err, io.EOF) { break } else if err != nil { return err } } // The next P bits encode the remainder r, err := b.ReadBits(P) if errors.Is(err, io.EOF) { break } else if err != nil { return err } n := q*uint64(math.Exp2(float64(P))) + r num := n + prevNum numbers = append(numbers, num) prevNum = num } fmt.Println(numbers)
The above operation produces the same list of numbers, so we can reconstruct it without knowing N. So I'm not sure what the rationale for adding N to the filter is. If you know why N must be included, please let me know!
Thanks for reading!