Merkle Prover

This section explains the prover of the Merkle commitment scheme, focusing on how columns are committed to compute a Merkle root and how the Merkle tree layers are constructed, as well as how to generate Merkle proofs (decommitments). S-two represents the computation trace using multiple columns of different sizes. Rather than committing to each column in separate Merkle trees, S-two uses a modified Merkle tree structure where all columns of different sizes are committed to using a single Merkle tree. We will describe the structure of this Merkle tree in this section.

MerkleProver Structure

The MerkleProver struct represents a prover for a Merkle commitment scheme. It stores all layers of the Merkle tree, where each layer contains the hash values at that level.

pub struct MerkleProver<B: MerkleOps<H>, H: MerkleHasher> {
    /// Layers of the Merkle tree.
    /// The first layer is the root layer.
    /// The last layer is the largest layer.
    /// See [MerkleOps::commit_on_layer] for more details.
    pub layers: Vec<Col<B, H::Hash>>,
}

The Commit Process

The core of the Merkle prover is the commit function, which takes as input a set of columns and outputs a MerkleProver containing all layers of the Merkle tree. The columns must be of length that is a power of 2. Below is the complete implementation of the commit function:

    pub fn commit(columns: Vec<&Col<B, BaseField>>) -> Self {
        let _span = span!(Level::TRACE, "Merkle", class = "MerkleCommitment").entered();
        if columns.is_empty() {
            return Self {
                layers: vec![B::commit_on_layer(0, None, &[])],
            };
        }

        let columns = &mut columns
            .into_iter()
            .sorted_by_key(|c| Reverse(c.len()))
            .peekable();

        let mut layers: Vec<Col<B, H::Hash>> = Vec::new();

        let max_log_size = columns.peek().unwrap().len().ilog2();
        for log_size in (0..=max_log_size).rev() {
            // Take columns of the current log_size.
            let layer_columns = columns
                .peek_take_while(|column| column.len().ilog2() == log_size)
                .collect_vec();

            layers.push(B::commit_on_layer(log_size, layers.last(), &layer_columns));
        }
        layers.reverse();
        Self { layers }
    }

Let’s walk through the function step by step:

Sort Columns by Length: Columns are sorted in descending order of length (columns with the most elements appear first). This ensures that the largest columns are committed first.
Layer-by-Layer Commitment:
- For each layer (from largest to smallest), the function collects all columns of the current size and computes the hashes for that layer using the commit_on_layer function.
- For the largest layer, the previous layer’s hashes are empty, so the hash is computed directly from the column values.
- For subsequent layers, the hash is computed from the previous layer’s hashes and the current layer’s column values.
Reverse Layers: After all layers are computed, the list of layers is reversed so that the root layer is at the beginning.

Example: Commit Process

Suppose the input column data is as shown below:

Figure 1: Example column data to commit using a Merkle tree

For this example, the columns are already in the sorted order:

\textcolor{red}{\text{\textit{Column} 0}}

\textcolor{green}{\text{\textit{Column} 1}}

\textcolor{blue}{\text{\textit{Column} 2}}

(from longest to shortest). We will now compute the hashes stored at each layer.

First Layer (Leaves): The hashes are computed directly from the column values: $[h_{00}, h_{01}, h_{10}, h_{11}] = [H(\textcolor{red}{a}, \textcolor{green}{p}), H(\textcolor{red}{b}, \textcolor{green}{q}), H(\textcolor{red}{c}, \textcolor{green}{r}), H(\textcolor{red}{d}, \textcolor{green}{s})]$
Second Layer: The next layer uses the previous hashes and the values from $\textcolor{blue}{\text{Column 2}}$ : $[h_0, h_1] = [H(h_{00}, h_{01}, \textcolor{blue}{u}), H(h_{10}, h_{11}, \textcolor{blue}{v})]$
Root: The root is computed as $root = H(h_0, h_1)$ .

The resulting Merkle tree is illustrated below:

Figure 2: Merkle tree structure after commitment

The Decommit Process

The decommitment process enables the prover to generate a Merkle proof for a set of queried indices, allowing the verifier to check that specific elements are included in the committed Merkle tree. The output is a MerkleDecommitment struct, which contains the hash and column values required for the verifier to reconstruct the Merkle root at the queried positions. Its implementation is shown below:

pub struct MerkleDecommitment<H: MerkleHasher> {
    /// Hash values that the verifier needs but cannot deduce from previous computations, in the
    /// order they are needed.
    pub hash_witness: Vec<H::Hash>,
    /// Column values that the verifier needs but cannot deduce from previous computations, in the
    /// order they are needed.
    /// This complements the column values that were queried. These must be supplied directly to
    /// the verifier.
    pub column_witness: Vec<BaseField>,
}

The decommit function implemented for the MerkleProver takes as input:

queries_per_log_size: A map from log size to a vector of query indices for columns of that size.
columns: The column data that was committed to in the Merkle tree.

It returns:

A vector of queried values, ordered as they are opened (from largest to smallest layer).
A MerkleDecommitment containing the hash and column witnesses needed for verification.

Below is the complete implementation of the decommit function:

    pub fn decommit(
        &self,
        queries_per_log_size: &BTreeMap<u32, Vec<usize>>,
        columns: Vec<&Col<B, BaseField>>,
    ) -> (Vec<BaseField>, MerkleDecommitment<H>) {
        // Prepare output buffers.
        let mut queried_values = vec![];
        let mut decommitment = MerkleDecommitment::empty();

        // Sort columns by layer.
        let mut columns_by_layer = columns
            .iter()
            .sorted_by_key(|c| Reverse(c.len()))
            .peekable();

        let mut last_layer_queries = vec![];
        for layer_log_size in (0..self.layers.len() as u32).rev() {
            // Prepare write buffer for queries to the current layer. This will propagate to the
            // next layer.
            let mut layer_total_queries = vec![];

            // Each layer node is a hash of column values as previous layer hashes.
            // Prepare the relevant columns and previous layer hashes to read from.
            let layer_columns = columns_by_layer
                .peek_take_while(|column| column.len().ilog2() == layer_log_size)
                .collect_vec();
            let previous_layer_hashes = self.layers.get(layer_log_size as usize + 1);

            // Queries to this layer come from queried node in the previous layer and queried
            // columns in this one.
            let mut prev_layer_queries = last_layer_queries.into_iter().peekable();
            let mut layer_column_queries =
                option_flatten_peekable(queries_per_log_size.get(&layer_log_size));

            // Merge previous layer queries and column queries.
            while let Some(node_index) =
                next_decommitment_node(&mut prev_layer_queries, &mut layer_column_queries)
            {
                if let Some(previous_layer_hashes) = previous_layer_hashes {
                    // If the left child was not computed, add it to the witness.
                    if prev_layer_queries.next_if_eq(&(2 * node_index)).is_none() {
                        decommitment
                            .hash_witness
                            .push(previous_layer_hashes.at(2 * node_index));
                    }

                    // If the right child was not computed, add it to the witness.
                    if prev_layer_queries
                        .next_if_eq(&(2 * node_index + 1))
                        .is_none()
                    {
                        decommitment
                            .hash_witness
                            .push(previous_layer_hashes.at(2 * node_index + 1));
                    }
                }

                // If the column values were queried, return them.
                let node_values = layer_columns.iter().map(|c| c.at(node_index));
                if layer_column_queries.next_if_eq(&node_index).is_some() {
                    queried_values.extend(node_values);
                } else {
                    // Otherwise, add them to the witness.
                    decommitment.column_witness.extend(node_values);
                }

                layer_total_queries.push(node_index);
            }

            // Propagate queries to the next layer.
            last_layer_queries = layer_total_queries;
        }

        (queried_values, decommitment)
    }

Let’s break down the function step by step:

Sort Columns by Length:
- As in the commit function, columns are sorted in descending order of length.
Layer-by-Layer Decommitment:
- For each layer (from largest to smallest):
  - Collect all columns of the current size (layer_columns) and the previous layer’s hashes (previous_layer_hashes).
  - Retrieve the queries for the current layer (layer_column_queries) and the previous layer’s queries (prev_layer_queries).
  - For each node index to be decommitted in this layer:
    - Check if the child node hashes of the current node can be computed by the verifier, and if not, add the missing child node hashes to the hash_witness in the decommitment.
    - If the node index is queried, fetch the corresponding column values and append them to queried_values.
    - If not queried, add the column values to the column_witness in the decommitment.
  - The set of node indices decommitted in this layer is propagated as queries to the next layer.

Example: Decommit Process

For the column data in Figure 1, consider the query indices

[(2, [0]), (1, [1])]

, where the query indices are maps from log size to a vector of query indices for columns of that size. This corresponds to querying the following elements:

The 0th element of columns of size $2^2=4$ : $\textcolor{red}{a}$ from $\textcolor{red}{\text{\textit{Column} 0}}$ and $\textcolor{green}{p}$ from $\textcolor{green}{\text{\textit{Column} 1}}$ .
The 1st element of the column of size $2^1=2$ : $\textcolor{blue}{v}$ from $\textcolor{blue}{\text{\textit{Column} 2}}$ .

Because columns of equal length are committed together, the same indices are opened together in the decommitment. For example, for query

(2, [0])

, both

\textcolor{red}{a}

and

\textcolor{green}{p}

are opened together. Below is a walkthrough of the main loop in the decommit function, showing the state of key variables for each layer_log_size:

layer_log_size = 2 (columns of size 4):
- layer_columns: $[\textcolor{red}{\text{\textit{Column} 0}}, \textcolor{green}{\text{\textit{Column} 1}}]$
- previous_layer_hashes: None (first layer)
- prev_layer_queries: empty
- layer_column_queries: $[0]$
- For node_index = 0:
  - node_values: $[\textcolor{red}{a}, \textcolor{green}{p}]$
  - Queried, so append to queried_values: $[\textcolor{red}{a}, \textcolor{green}{p}]$
  - Add node_index = 0 to layer_total_queries (to propagate to next layer)
- At this stage: queried_values = $[\textcolor{red}{a}, \textcolor{green}{p}]$ , decommitment is empty.
layer_log_size = 1 (columns of size 2):
- layer_columns: $[\textcolor{blue}{\text{\textit{Column} 2}}]$
- previous_layer_hashes: $[h_{00}, h_{01}, h_{10}, h_{11}]$
- prev_layer_queries: $[0]$
- layer_column_queries: $[1]$
- For node_index = 0:
  - Add $h_{01}$ to hash_witness in decommitment.
  - node_values: $[\textcolor{blue}{u}]$
  - Not queried, so append to column_witness in decommitment.
  - Add node_index = 0 to layer_total_queries.
- At this stage: queried_values = $[\textcolor{red}{a}, \textcolor{green}{p}]$ , hash_witness = $[h_{01}]$ , column_witness = $[\textcolor{blue}{u}]$ , layer_total_queries = $[0]$ .
- For node_index = 1:
  - Add $h_{10}, h_{11}$ to hash_witness in decommitment.
  - node_values: $[\textcolor{blue}{v}]$
  - Queried, so append to queried_values.
  - Add node_index = 1 to layer_total_queries.
- At this stage: queried_values = $[\textcolor{red}{a}, \textcolor{green}{p}, \textcolor{blue}{v}]$ , hash_witness = $[h_{01}, h_{10}, h_{11}]$ , column_witness = $[\textcolor{blue}{u}]$ , layer_total_queries = $[0, 1]$ (to propagate to next layer).
layer_log_size = 0 (root):
- layer_columns: empty
- previous_layer_hashes: $[h_{0}, h_{1}]$
- prev_layer_queries: $[0, 1]$
- layer_column_queries: empty
- No values are added to queried_values, hash_witness, or column_witness in this layer.

Final output:

queried_values: $[\textcolor{red}{a}, \textcolor{green}{p}, \textcolor{blue}{v}]$
decommitment:
- hash_witness: $[h_{01}, h_{10}, h_{11}]$
- column_witness: $[\textcolor{blue}{u}]$

In the next section, we describe the verification process to verify the inclusion of the queried values using the Merkle decommitment.

Protocol

S-two book

Cheatsheets

MerkleProver Structure

The Commit Process

Example: Commit Process

The Decommit Process

Example: Decommit Process

Protocol

S-two book

Cheatsheets

​MerkleProver Structure

​The Commit Process

​Example: Commit Process

​The Decommit Process

​Example: Decommit Process

MerkleProver Structure

The Commit Process

Example: Commit Process

The Decommit Process

Example: Decommit Process