Halo2 Lookup Protocol

diving into the halo2 lookup protocol.

Introduction

This is a way to prove something about two lists of data (or “columns” in cryptographic terms) called $A$ and $S$ . Imagine $A$ as a list of items you’re claiming something about, and $S$ as a reference list you’re checking against. The goal is to prove that every item in $A$ exists somewhere in $S$ —in other words, $A$ is a subset of $S$ . But there’s more to it: $S$ might have extra items not in $A$ , and both lists can have repeats. Plus, $S$ can either be a fixed list everyone agrees on beforehand or a variable list that’s part of the proof itself. Let’s unpack this carefully

Key Components

Two Columns: (A) and (S)
- $(A)$ is the “claim” column—think of it as the list you’re testing.
- $(S)$ is the “reference” column—the list you’re checking against.
- Both have a length of $(2^k)$ , where $(k)$ is some integer (e.g., if $(k = 2)$ , the length is $4$ ). This power-of-two size is common in cryptographic protocols because it aligns with mathematical structures like roots of unity.
The Goal of the Protocol
- Prove every cell in (A) is in (S): Every value in $(A)$ must appear at least once in $(S)$ . For example, if $(A = [1, 2])$ , then $(S)$ must include both $1$ and $2$ somewhere (e.g., $(S = [3, 1, 2, 4])$ works).
- Some cells in (S) can not be in (A): (S) can have extra values that don’t appear in (A). In the example above, 3 and 4 in (S) aren’t in (A), and that’s fine.
- This is a subset relationship $((A \subseteq S))$ , but with a twist: we’re not checking strict equality or exact matches—just that (A)’s values are covered by $(S)$ .
Fixed or Variable (S)
- Fixed (S): $(S)$ could be a predefined list, like a public database everyone trusts. The protocol proves (A) fits within this known $(S)$ .
- Variable (S): $(S)$ could be part of the “trace” (a term from zero-knowledge proofs meaning the computation’s record). Here, $(S)$ isn’t fixed in advance—it’s provided by the prover as part of the proof, and its values can vary. This makes the protocol more flexible but also trickier to verify.
Duplicates Allowed
- Both $(A)$ and $(S)$ can have repeated values. For instance, $(A = [1, 1, 2])$ and $(S = [1, 2, 2, 3])$ are valid, and the protocol should still work (1 and 2 from (A) are in (S)).
Padding to (2^k)
- If (A) or (S) isn’t already $(2^k)$ long, they need to be extended:
  - (A): Fill it with “some data from (S)”. If $(A = [1, 2])$ and $(k = 2)$ , it might become $[1, 2, 3, 3]$ by borrowing 3 from $(S = [3, 3])$ .
  - (S): Fill it with its last value. If $(S = [3, 4])$ , it becomes $[3, 4, 4, 4]$ .
- Why $(2^k)$ ? This matches the size of a mathematical group (like $(H)$ in the protocol process), making polynomial operations easier.

What’s Being Proved?

The protocol isn’t just checking if $(A)$ fits in $(S)$ in a simple way—it’s doing so in a cryptographic context, likely for a zero-knowledge proof system like Halo2. This means:

The prover (who knows $(A)$ and $(S)$ ) convinces the verifier (who doesn’t fully see the lists) that $(A \subseteq S)$ .
It’s efficient and secure, using polynomials and permutations (as we’ll see in the protocol process).

The flexibility of (S) being fixed or variable is key:

Fixed (S) might be used to prove something against a standard table (e.g., “all my transaction amounts are valid denominations”).
Variable (S) might be used in a computation where $(S)$ is generated on the fly (e.g., “my outputs match some inputs I computed”).

A Simple Example

Let’s try a small case:

$(k = 2)$ , so length = $4$ .
Original $(A = [1, 2]), (S = [1, 3])$ .
Pad $(A)$ to $[1, 2, 3, 3]$ (taking 3 from (S)).
Pad (S) to $[1, 3, 3, 3]$ .
Goal: Prove $[1, 2]$ ’s values (1 and 2) are in $[1, 3]$ . Oops—2 isn’t in (S)! This hints at a potential issue we’ll explore later.

Protocol

The Problem We’re Solving

Imagine you have two lists:

A: A list of values we want to check (e.g., $[1, 2]$ ).
S: A reference list (e.g., $[3, 4]$ ).

The goal is to prove that every value in A appears in S (i.e., A is a subset of $S$ ), even though $S$ might have extra values not in $A$ , and both lists can have repeats. The lists might not be long enough, so we’ll pad them to a length of $(2^k)$ (e.g., 4 if $(k = 2)$ ). The protocol uses polynomials and permutations over a mathematical structure called $(H)$ , which is a set of $(2^k$ ) elements: $(H = {1, w, w^2, ..., w^{2^k-1}})$ , where $(w)$ is a root of unity (think of it as points evenly spaced around a circle in the complex plane).

Padding A and S

Since A and S might be shorter than $(2^k)$ , we need to extend them:

Pad A: Fill it with values from S until it reaches length $(2^k)$ . For example, if $A = [1, 2]$ and $S = [3, 4]$ , with $(k = 2)$ , A becomes $[1, 2, 4, 4]$ (taking $4$ from $S$ ).
Pad S: Fill it with its last value. So S = $[3, 4]$ becomes $[3, 4, 4, 4]$ .

This ensures both lists are the right length, but it raises a question we’ll revisit later: does padding A with S’s values weaken the subset proof?

Step 2: Creating Permutations A' and S'

Now, the prover $(P)$ rearranges $A$ and $S$ into new lists, $A'$ and $S$ ', with specific rules:

A': Sort A so identical values are adjacent, and different values are ordered randomly. For $[1, 2, 4, 4]$ , A' might be $[1, 2, 4, 4]$ (already sorted, same values together).
S': Arrange S so that for every new value’s first appearance in A', S' matches it at the same position. Remaining positions in S' are filled randomly from S. For example:
- A' = [1, 2, 4, 4]
- S' might be [1, 2, 4, 4] (matching A' where new values start: 1 at position 0, 2 at 1, 4 at 2).

Additionally, A' and S' must satisfy:

For each position (i), either (A'_i = S'_i) or (A'i = A'{i-1}).

This rule ensures A' either follows S' or repeats its previous value, which helps prove that A’s values are “covered” by S.

Step 3: Computing the Z Polynomial

The prover computes a polynomial (Z) to tie everything together. Think of (Z) as a running product that accumulates evidence across the rows:

Boundary Conditions: (Z_0 = 1) and (Z_{2^k} = 1).
Recurrence: For (i) from 0 to (2^k - 1), [ Z_{i+1} = Z_i \times \frac{(A_i + \beta)(S_i + \gamma)}{(A'_i + \beta)(S'_i + \gamma)} ] where (\beta) and (\gamma) are random challenges (numbers) chosen by the verifier to prevent cheating.

For our example:

A = [1, 2, 4, 4], S = [3, 4, 4, 4]
A' = [1, 2, 4, 4], S' = [1, 2, 4, 4]
Suppose (\beta = 2), (\gamma = 3):
- (Z_0 = 1)
- (Z_1 = Z_0 \times \frac{(1+2)(3+3)}{(1+2)(1+3)} = 1 \times \frac{3 \times 6}{3 \times 4} = 1.5)
- (Z_2 = Z_1 \times \frac{(2+2)(4+3)}{(2+2)(2+3)} = 1.5 \times \frac{4 \times 7}{4 \times 5} = 2.1)
- And so on...

The idea is that if A’s values are in S, and A' and S' are consistent, (Z) will return to 1 by the end.

Step 4: Verifier Checks A' and S'

The verifier (V) checks two conditions over all points in (H) (i.e., for each position):

[ (A'(X) - S'(X)) \cdot (A'(X) - A'(w^{-1}X)) = 0 ]
- This means at every position, either (A'_i = S'_i) or (A'i = A'{i-1}).
- In our example, A' = S' = [1, 2, 4, 4], so (A'_i - S'_i = 0), and the condition holds.
[ L_0(X) \cdot (A'(X) - S'(X)) = 0 ]
- (L_0(X)) is a Lagrange polynomial that’s 1 at the first position (X = 1) and 0 elsewhere. This forces (A'_0 = S'_0) (here, 1 = 1).

These ensure A' and S' align as promised.

Step 5: Verifier Checks Z

The verifier also checks (Z) with two equations:

[ Z(wX) \cdot (A'(X) + \beta)(S'(X) + \gamma) - Z(X) \cdot (A(X) + \beta)(S(X) + \gamma) = 0 ]
- This confirms the recurrence relation holds across all positions. It’s the polynomial version of the (Z_{i+1} = Z_i \times ...) rule.
[ L_0(X) \cdot (1 - Z(X)) = 0 ]
- At X = 1 (first position), (Z_0 = 1), so (1 - Z_0 = 0). This checks the starting condition.

If both sets of equations (from Steps 4 and 5) hold, the verifier outputs “accept.”

Why Does This Prove A ⊆ S?

The magic lies in the interplay:

A' and S' being permutations of A and S, with A' matching S' on first occurrences, suggests all values in A appear in S.
The (Z) polynomial enforces consistency: if A had a value not in S, the prover couldn’t construct valid A' and S' permutations without breaking the equations.
The padding and permutation rules aim to ensure A’s values are “witnessed” in S.

Your Concern: Padding and Validity

You pointed out a flaw: if A = [1, 2] and S = [3, 4], padding A to [1, 2, 4, 4] and S to [3, 4, 4, 4], then setting A' = S' = [1, 2, 4, 4] might pass, even though [1, 2] isn’t a subset of [3, 4]. The protocol checks the padded versions, not the originals directly. This is a limitation: it assumes the prover pads honestly, but a dishonest prover could sneak in values. A fix would be to commit to the original S and prove A’s original values are in it, which Halo2 might do in a broader context (e.g., with additional constraints).

Summary

The protocol pads A and S, permutes them into A' and S', computes (Z), and verifies polynomial equations to prove A ⊆ S. It’s elegant but relies on honest padding. For a real system, extra checks would ensure the original subset property holds. What do you think—does this clarify the process, or is there a specific part you’d like to zoom in on?

PreviousPLOOKUP NextHYPERPLONK

Last updated 3 months ago