Skip to main content

Merkle Prover

This section explains the prover of the Merkle commitment scheme, focusing on how columns are committed to compute a Merkle root and how the Merkle tree layers are constructed, as well as how to generate Merkle proofs (decommitments). S-two represents the computation trace using multiple columns of different sizes. Rather than committing to each column in separate Merkle trees, S-two uses a modified Merkle tree structure where all columns of different sizes are committed to using a single Merkle tree. We will describe the structure of this Merkle tree in this section.

MerkleProver Structure

The MerkleProver struct represents a prover for a Merkle commitment scheme. It stores all layers of the Merkle tree, where each layer contains the hash values at that level.
pub struct MerkleProver<B: MerkleOps<H>, H: MerkleHasher> {
    /// Layers of the Merkle tree.
    /// The first layer is the root layer.
    /// The last layer is the largest layer.
    /// See [MerkleOps::commit_on_layer] for more details.
    pub layers: Vec<Col<B, H::Hash>>,
}

The Commit Process

The core of the Merkle prover is the commit function, which takes as input a set of columns and outputs a MerkleProver containing all layers of the Merkle tree. The columns must be of length that is a power of 2. Below is the complete implementation of the commit function:
    pub fn commit(columns: Vec<&Col<B, BaseField>>) -> Self {
        let _span = span!(Level::TRACE, "Merkle", class = "MerkleCommitment").entered();
        if columns.is_empty() {
            return Self {
                layers: vec![B::commit_on_layer(0, None, &[])],
            };
        }

        let columns = &mut columns
            .into_iter()
            .sorted_by_key(|c| Reverse(c.len()))
            .peekable();

        let mut layers: Vec<Col<B, H::Hash>> = Vec::new();

        let max_log_size = columns.peek().unwrap().len().ilog2();
        for log_size in (0..=max_log_size).rev() {
            // Take columns of the current log_size.
            let layer_columns = columns
                .peek_take_while(|column| column.len().ilog2() == log_size)
                .collect_vec();

            layers.push(B::commit_on_layer(log_size, layers.last(), &layer_columns));
        }
        layers.reverse();
        Self { layers }
    }
Let’s walk through the function step by step:
  1. Sort Columns by Length: Columns are sorted in descending order of length (columns with the most elements appear first). This ensures that the largest columns are committed first.
  2. Layer-by-Layer Commitment:
    • For each layer (from largest to smallest), the function collects all columns of the current size and computes the hashes for that layer using the commit_on_layer function.
    • For the largest layer, the previous layer’s hashes are empty, so the hash is computed directly from the column values.
    • For subsequent layers, the hash is computed from the previous layer’s hashes and the current layer’s column values.
  3. Reverse Layers: After all layers are computed, the list of layers is reversed so that the root layer is at the beginning.

Example: Commit Process

Suppose the input column data is as shown below:
Figure 1: Example column data to commit using a Merkle tree
For this example, the columns are already in the sorted order: Column 0\textcolor{red}{\text{\textit{Column} 0}}, Column 1\textcolor{green}{\text{\textit{Column} 1}}, Column 2\textcolor{blue}{\text{\textit{Column} 2}} (from longest to shortest). We will now compute the hashes stored at each layer.
  • First Layer (Leaves): The hashes are computed directly from the column values: [h00,h01,h10,h11]=[H(a,p),H(b,q),H(c,r),H(d,s)][h_{00}, h_{01}, h_{10}, h_{11}] = [H(\textcolor{red}{a}, \textcolor{green}{p}), H(\textcolor{red}{b}, \textcolor{green}{q}), H(\textcolor{red}{c}, \textcolor{green}{r}), H(\textcolor{red}{d}, \textcolor{green}{s})]
  • Second Layer: The next layer uses the previous hashes and the values from Column 2\textcolor{blue}{\text{Column 2}}: [h0,h1]=[H(h00,h01,u),H(h10,h11,v)][h_0, h_1] = [H(h_{00}, h_{01}, \textcolor{blue}{u}), H(h_{10}, h_{11}, \textcolor{blue}{v})]
  • Root: The root is computed as root=H(h0,h1)root = H(h_0, h_1).
The resulting Merkle tree is illustrated below:
Figure 2: Merkle tree structure after commitment

The Decommit Process

The decommitment process enables the prover to generate a Merkle proof for a set of queried indices, allowing the verifier to check that specific elements are included in the committed Merkle tree. The output is a MerkleDecommitment struct, which contains the hash and column values required for the verifier to reconstruct the Merkle root at the queried positions. Its implementation is shown below:
pub struct MerkleDecommitment<H: MerkleHasher> {
    /// Hash values that the verifier needs but cannot deduce from previous computations, in the
    /// order they are needed.
    pub hash_witness: Vec<H::Hash>,
    /// Column values that the verifier needs but cannot deduce from previous computations, in the
    /// order they are needed.
    /// This complements the column values that were queried. These must be supplied directly to
    /// the verifier.
    pub column_witness: Vec<BaseField>,
}
The decommit function implemented for the MerkleProver takes as input:
  • queries_per_log_size: A map from log size to a vector of query indices for columns of that size.
  • columns: The column data that was committed to in the Merkle tree.
It returns:
  • A vector of queried values, ordered as they are opened (from largest to smallest layer).
  • A MerkleDecommitment containing the hash and column witnesses needed for verification.
Below is the complete implementation of the decommit function:
    pub fn decommit(
        &self,
        queries_per_log_size: &BTreeMap<u32, Vec<usize>>,
        columns: Vec<&Col<B, BaseField>>,
    ) -> (Vec<BaseField>, MerkleDecommitment<H>) {
        // Prepare output buffers.
        let mut queried_values = vec![];
        let mut decommitment = MerkleDecommitment::empty();

        // Sort columns by layer.
        let mut columns_by_layer = columns
            .iter()
            .sorted_by_key(|c| Reverse(c.len()))
            .peekable();

        let mut last_layer_queries = vec![];
        for layer_log_size in (0..self.layers.len() as u32).rev() {
            // Prepare write buffer for queries to the current layer. This will propagate to the
            // next layer.
            let mut layer_total_queries = vec![];

            // Each layer node is a hash of column values as previous layer hashes.
            // Prepare the relevant columns and previous layer hashes to read from.
            let layer_columns = columns_by_layer
                .peek_take_while(|column| column.len().ilog2() == layer_log_size)
                .collect_vec();
            let previous_layer_hashes = self.layers.get(layer_log_size as usize + 1);

            // Queries to this layer come from queried node in the previous layer and queried
            // columns in this one.
            let mut prev_layer_queries = last_layer_queries.into_iter().peekable();
            let mut layer_column_queries =
                option_flatten_peekable(queries_per_log_size.get(&layer_log_size));

            // Merge previous layer queries and column queries.
            while let Some(node_index) =
                next_decommitment_node(&mut prev_layer_queries, &mut layer_column_queries)
            {
                if let Some(previous_layer_hashes) = previous_layer_hashes {
                    // If the left child was not computed, add it to the witness.
                    if prev_layer_queries.next_if_eq(&(2 * node_index)).is_none() {
                        decommitment
                            .hash_witness
                            .push(previous_layer_hashes.at(2 * node_index));
                    }

                    // If the right child was not computed, add it to the witness.
                    if prev_layer_queries
                        .next_if_eq(&(2 * node_index + 1))
                        .is_none()
                    {
                        decommitment
                            .hash_witness
                            .push(previous_layer_hashes.at(2 * node_index + 1));
                    }
                }

                // If the column values were queried, return them.
                let node_values = layer_columns.iter().map(|c| c.at(node_index));
                if layer_column_queries.next_if_eq(&node_index).is_some() {
                    queried_values.extend(node_values);
                } else {
                    // Otherwise, add them to the witness.
                    decommitment.column_witness.extend(node_values);
                }

                layer_total_queries.push(node_index);
            }

            // Propagate queries to the next layer.
            last_layer_queries = layer_total_queries;
        }

        (queried_values, decommitment)
    }
Let’s break down the function step by step:
  1. Sort Columns by Length:
    • As in the commit function, columns are sorted in descending order of length.
  2. Layer-by-Layer Decommitment:
    • For each layer (from largest to smallest):
      • Collect all columns of the current size (layer_columns) and the previous layer’s hashes (previous_layer_hashes).
      • Retrieve the queries for the current layer (layer_column_queries) and the previous layer’s queries (prev_layer_queries).
      • For each node index to be decommitted in this layer:
        • Check if the child node hashes of the current node can be computed by the verifier, and if not, add the missing child node hashes to the hash_witness in the decommitment.
        • If the node index is queried, fetch the corresponding column values and append them to queried_values.
        • If not queried, add the column values to the column_witness in the decommitment.
      • The set of node indices decommitted in this layer is propagated as queries to the next layer.

Example: Decommit Process

For the column data in Figure 1, consider the query indices [(2,[0]),(1,[1])][(2, [0]), (1, [1])], where the query indices are maps from log size to a vector of query indices for columns of that size. This corresponds to querying the following elements:
  • The 0th element of columns of size 22=42^2=4: a\textcolor{red}{a} from Column 0\textcolor{red}{\text{\textit{Column} 0}} and p\textcolor{green}{p} from Column 1\textcolor{green}{\text{\textit{Column} 1}}.
  • The 1st element of the column of size 21=22^1=2: v\textcolor{blue}{v} from Column 2\textcolor{blue}{\text{\textit{Column} 2}}.
Because columns of equal length are committed together, the same indices are opened together in the decommitment. For example, for query (2,[0])(2, [0]), both a\textcolor{red}{a} and p\textcolor{green}{p} are opened together. Below is a walkthrough of the main loop in the decommit function, showing the state of key variables for each layer_log_size:
  • layer_log_size = 2 (columns of size 4):
    • layer_columns: [Column 0,Column 1][\textcolor{red}{\text{\textit{Column} 0}}, \textcolor{green}{\text{\textit{Column} 1}}]
    • previous_layer_hashes: None (first layer)
    • prev_layer_queries: empty
    • layer_column_queries: [0][0]
    • For node_index = 0:
      • node_values: [a,p][\textcolor{red}{a}, \textcolor{green}{p}]
      • Queried, so append to queried_values: [a,p][\textcolor{red}{a}, \textcolor{green}{p}]
      • Add node_index = 0 to layer_total_queries (to propagate to next layer)
    • At this stage: queried_values = [a,p][\textcolor{red}{a}, \textcolor{green}{p}], decommitment is empty.
  • layer_log_size = 1 (columns of size 2):
    • layer_columns: [Column 2][\textcolor{blue}{\text{\textit{Column} 2}}]
    • previous_layer_hashes: [h00,h01,h10,h11][h_{00}, h_{01}, h_{10}, h_{11}]
    • prev_layer_queries: [0][0]
    • layer_column_queries: [1][1]
    • For node_index = 0:
      • Add h01h_{01} to hash_witness in decommitment.
      • node_values: [u][\textcolor{blue}{u}]
      • Not queried, so append to column_witness in decommitment.
      • Add node_index = 0 to layer_total_queries.
    • At this stage: queried_values = [a,p][\textcolor{red}{a}, \textcolor{green}{p}], hash_witness = [h01][h_{01}], column_witness = [u][\textcolor{blue}{u}], layer_total_queries = [0][0].
    • For node_index = 1:
      • Add h10,h11h_{10}, h_{11} to hash_witness in decommitment.
      • node_values: [v][\textcolor{blue}{v}]
      • Queried, so append to queried_values.
      • Add node_index = 1 to layer_total_queries.
    • At this stage: queried_values = [a,p,v][\textcolor{red}{a}, \textcolor{green}{p}, \textcolor{blue}{v}], hash_witness = [h01,h10,h11][h_{01}, h_{10}, h_{11}], column_witness = [u][\textcolor{blue}{u}], layer_total_queries = [0,1][0, 1] (to propagate to next layer).
  • layer_log_size = 0 (root):
    • layer_columns: empty
    • previous_layer_hashes: [h0,h1][h_{0}, h_{1}]
    • prev_layer_queries: [0,1][0, 1]
    • layer_column_queries: empty
    • No values are added to queried_values, hash_witness, or column_witness in this layer.
Final output:
  • queried_values: [a,p,v][\textcolor{red}{a}, \textcolor{green}{p}, \textcolor{blue}{v}]
  • decommitment:
    • hash_witness: [h01,h10,h11][h_{01}, h_{10}, h_{11}]
    • column_witness: [u][\textcolor{blue}{u}]
In the next section, we describe the verification process to verify the inclusion of the queried values using the Merkle decommitment.