Aggregation Layer
Aggregation Layer
1
Table 1: Comparison of zero-knowledge proof technologies for compression of non-deletion proofs.
Hash Proving Proof Proof Size Trusted Impl.
ZK Stack
Function Speed (tx/s) Size Asymptotics Setup Effort
None (“hash based”) SHA-256 10 000* 10 MB O(n) No N/A
CIRCOM + Groth16 Poseidon 25 250 b O(1) Yes Lower
Gnark + Groth16 Poseidon 30 250 b O(1) Yes Low
SP1 zkVM SHA-256 1.5 2 MB O(log n) No Lowest
Cairo 0 + STwo Poseidon 60† 2.4 MB O(log n) No Medium
Cairo + STwo Poseidon 100 2.4 MB O(log n) No Medium
AIR + Plonky3‡ Poseidon2 10 000 1.7 MB O(log n) No High
AIR + Plonky3 Poseidon2 2500 0.7 MB O(log n) No High
AIR + Plonky3 Blake3 250 1.7 MB O(log n) No High
*
Bandwidth-limited, no verification effort reduction.
†
Trace generation before proving is impractically slow.
‡
See Section 7.5 for details.
2
Aggregation Layer connects to the Consensus aim to further democratize the participation in the
Layer. For fully trustless operation, each request is network.
accompanied by a cryptographic proof of SMT con- PoW chains encounter rollbacks (“reorgs”) when
sistency. alternative chains with a greater cumulative PoW
work emerge. Limiting the maximum length of alter-
native chains creates the risk of involuntary forking—
Consensus Layer both alternative chains may be too long for a rollback.
This risk is specifically mitigated by a finality gadget.
On the other hand, PoW chains are extremely robust.
Aggregation Layer If any number of validators leave or join the network,
the chain continues to grow, and the block rate even-
tually adjusts to the new total mining power. In
short, PoW trades liveness for safety.
The purpose of BFT consensus layer is twofold: 1)
Execution Layer to provide deterministic (one-block) finality for the
layers below, and 2) to achieve a fast and predictable
block rate. BFT consensus trades liveness for safety:
Figure 1: Layered architecture of the Unicity Net- it is more fragile, as its liveness depends on a super-
work. majority (e.g., two thirds) of validators being online
and cooperative at any moment.
The usual way to achieve permissionless BFT con-
2.1 Consensus Layer sensus is to use a Proof-of-Stake (PoS) setup. This
can be delicate, especially during the launch of a
Decentralization is achieved by a Proof-of-Work blockchain protocol: there are known weaknesses like
(PoW) blockchain instance which manages consen- “nothing at stake attack”, and risk of centralization.
sus, including the validator selection for the BFT fi- PoW-based protocols (and longest-chain-rule proto-
nality gadget, implementing the native token, exe- cols in general) are more robust and well-suited for
cuting the tokenomics plan, and handling the valida- achieving a wide initial token distribution and estab-
tor incentives. PoW is specifically robust during the lishing token value for effective decentralization.
bootstrapping of a decentralized system: when the By combining a PoW chain with a BFT consen-
number of validators fluctuates, the financial value of sus layer, Unicity leverages the desirable properties
tokens is low, and token distribution is relatively con- of both mechanisms. The PoW chain provides de-
centrated. PoW shows great liveness properties. At centralization, robustness, and high security for the
the same time, PoW chains do not provide fast and base currency, while the BFT layer provides fast, de-
deterministic finality: many blocks of confirmations terministic finality for the Aggregation Layer.
are needed to achieve a reasonable level of certainty. In Unicity, the BFT layer operates at a much
In Unicity, this is mitigated by including a BFT “fi- higher block rate than the PoW chain. Validators for
nality gadget” which runs rather fast, and the finality the BFT Consensus Layer are selected infrequently
of transactions below is defined by the consensus of from a pool of recent, high-performing PoW miners,
the BFT cluster. based on a deterministic algorithm and PoW chain
The PoW layer provides permissionlessness, a core content; anyone can execute the algorithm to verify
property of decentralized blockchains. Any valida- the selection. PoW validators may also delegate their
tor can actively participate in mining, and blocks are BFT layer validation rights.
chosen based on the longest-chain rule. By selecting Consensus Layer validators receive their block re-
a PoW mining puzzle that is resistant to acceleration wards at the ends of epochs. It is possible to increase
by GPUs and ASICs (specifically: RandomX [7]), we economic security by implementing slashing based on
3
withheld PoW and Consensus Layer block rewards. using a SNARK. Assuming correct validation of the
non-deletion proof and chaining of the Aggregation
2.1.1 Consensus Roadmap Layer’s state roots by the Consensus Layer, the Ag-
gregation Layer can be considered trustless.
The introduction of economic security mechanisms is
a logical step toward evolving the Consensus Layer
into a full Proof-of-Stake (PoS) system, once the
2.3 Execution Layer
chain is stable and token distribution reasonably di- The Execution Layer, also known as the Agent Layer,
versified. A PoS system would provide stronger eco- is responsible for executing transactions and other
nomic security for the BFT nodes while being more business logic, using the services of the Aggregation
energy-efficient and environmentally responsible than Layer and Unicity in general.
PoW mining.
The switch to PoS includes the following steps:
1) introducing the staking mechanism to create eco- 3 Security Model of the Aggre-
nomic security for the BFT layer, 2) alternative gation Layer
ledger for the native token securing and decentral-
izing the system, and executing the tokenomics plan The Aggregation Layer implements a distributed, au-
there, 3) selecting BFT validators based on the stake, thenticated, append-only dictionary data structure.
4) adjusting incentives (block rewards, optional slash- It authenticates incoming state transfer certification
ing), 5) migrating the token balances, and 5) sunset- requests by verifying that the sender possesses the
ting the PoW chain. private key corresponding to the public key that iden-
tifies the current token owner. The specific authenti-
2.2 Aggregation Layer cation protocol is beyond the scope of this paper.
The Aggregation Layer implements a global, append- Definition 1 (Consistency) An append-only ac-
only key-value store that immutably records every cumulator operates in batches B = (k1 , k2 , . . . , kj ),
spent token state. More specifically, it provides the accepting new keys. The append-only accumulator
following services: 1) recording of key-value tuples is consistent, if 1) during the insertion of a batch
where the key identifies a token state and value of updates, no existing element was deleted or mod-
is recording some meta-data, 2) returning inclusion ified; 2) it is possible to generate inclusion proofs
proofs of keys, 3) returning non-inclusion proofs of inc
πk∈{B 1 ,...,Bi }
= (vk ⇝ r, c) for all previously in-
keys not present in the store. serted elements, but not for non-existent elements;
The Aggregation Layer periodically has its state 3) it is possible to generate non-inclusion proofs
authenticator certified by the Consensus Layer. πkinc∈{B = (∅k ⇝ r, c) for all elements not so
/ 1 ,...,Bi }
The Aggregation layer is sharded based on far inserted to the accumulator, and not for those al-
keyspace slices and can be made hierarchical, as ready inserted.
shown in Figure 2.
Proof of non-deletion: Once a key is set, it has When instantiated as a Sparse Merkle Tree (SMT),
to remain there forever. Every state change of the then vk ⇝ r is the hash chain from the value at k-
Aggregation Layer (or a slice thereof) is accompa- th position to root c, and ∅k ⇝ r denotes the hash
nied by a cryptographic proof establishing that pre- chain from the “empty” value at k-th position to root
existing keys have not been removed or their values c.
altered, only new keys were added. The size of this After each batch of additions, the new root of the
proof is logarithmic with respect to the tree’s capac- Aggregation Layer’s SMT is certified by the BFT fi-
ity and linear with respect to the size of the inclu- nality gadget, ensuring its uniqueness and immutabil-
sion batch. This can be reduced to a constant size ity. This provides a secure trust anchor for all consis-
4
Figure 2: Sharded architecture of the Aggregation Layer.
tency, inclusion, and non-inclusion proofs. The ide- requests is denoted as Bi . At the end of each batch,
alized Consensus Layer is modeled as Algorithm 1. the Aggregation Layer produces its summary root
hash ri and sends it to the Consensus Layer for certi-
Consensus Layer fication. A certification request (ri , ri−1 , π) includes:
1) the previous state root hash, 2) the new state root
hash, 3) a consistency proof of the changes made dur-
(ri , ri−1 , π) c = (i, ri , ri−1 ; scl ) ing the batch, and 4) an authenticator that identifies
the operator.
The Consensus Layer certifies the request only if it
uniquely extends a previously certified state root and
the consistency proof is valid. It returns a certificate
c = (i, ri , ri−1 ; scl ), where scl is a signature from the
SMT Consensus Layer (e.g., a threshold signature from the
consensus nodes or a proof of inclusion in a finalized
block).
Each state can be extended only once, which pre-
inc
πk∈{B =(vk ⇝r,c) vents forks within the Aggregation Layer. Each sub-
1 ,...,Bi }
B = (k1 , k2 , . . . , kj ) inc
πk ∈{B
/ 1 ,...,Bi }
=(∅k ⇝r,c) sequent round extends the most recently certified
state. We model the Consensus Layer as an oracle,
as shown in Algorithm 1.
Token Users The SMT provides users with inclusion and non-
inclusion proofs. Each proof is anchored to a state
Figure 3: Security model of the Aggregation Layer. root certified by the Consensus Layer.
The Consensus Layer must guarantee data avail-
For efficiency reasons client requests are processed ability. If recent state roots were lost, it would
in batches; the tree is re-calculated and the tree root become impossible to reject duplicate state transi-
is certified when a batch is closed. A batch of client tion requests, potentially allowing malicious actors
5
Algorithm 1 Consensus Layer modeled as an oracle The second point is addressed by validating a
function Initialize() unique state root snapshot embedded in the PoW
r− ← ⊥ block header. Since the cumulative state snapshot
i←0 appears with a delay, the block can only be consid-
end function ered final after a snapshot publishing and block con-
function CertificationRequest(ri , ri−1 , π) firmation period; hence, maximalist verification is not
if (ri−1 ̸= r− ) ∨ ¬valid(π, ri , ri−1 ) then instantaneous.
return ⊥ The third point is addressed by auditing the oper-
end if ation of the Aggregation Layer—specifically, ensur-
r− ← ri ing that no Inclusion Proofs have been generated for
i←i+1 the token that are not reflected in its recorded his-
scl ← sigcl (i, ri , ri−1 ) tory. To achieve this, all non-deletion proofs from the
return c = (i, ri , ri−1 ; scl ) token’s genesis up to its current state must be vali-
end function dated. This is made efficient through the use of recur-
sive zero-knowledge proofs (ZKPs), which show that
each round’s non-deletion proof is valid and that no
to double-spend against an old, un-extendable state. rounds were skipped from verification. These recur-
The Aggregation Layer itself does not require an sive proofs are generated periodically and are made
internal consensus mechanism; protocols like Raft available with some latency.
could be used for replication and coordination among
its redundant nodes. The decentralized consensus is
provided by the external Consensus Layer.
If each state transition is accompanied by a cryp- 3.2 Practical Security Assumptions
tographic proof of non-deletion (see Section 4), the
Aggregation Layer can be considered trustless. If we relax the model by assuming that a majority of
BFT consensus nodes exhibit economically rational
behavior and do not collude maliciously with the Ag-
3.1 “Maximalist” Security Assump- gregation Layer, the user can enjoy significantly more
tions practical operational parameters. BFT layer forking
In this model, we assume that users are capable of (case 2 above) or certifying conflicting states (case 3
validating all aspects of system operation that are above) produces strong cryptographic evidence which
relevant to their own assets. This level of trustless- is processed out of the critical path of serving users.
ness is close to the strong guarantees introduced by In this scenario, a transaction is finalized, and
Bitcoin [3], where each “client” functions as a full val- an inclusion proof is returned within a few seconds,
idator, starting from downloading and verifying the allowing the transaction to be independently veri-
blockchain from the genesis block. fied—without consulting external data3 —within the
The Root of Trust is the PoW blockchain. A max- same timeframe.
imalist user maintains a full node of this chain. This The Root of Trust is the set of epoch change
is relatively lightweight, as the “utility” transactions records of the BFT consensus layer. These records
are executed at the Execution Layer. Upon receiving grow slowly (few aggregated signatures per week).
a token, the user must be able to efficiently verify the When transitioning to proof-of-stake (PoS) consen-
following: sus (see Section 2.1.1), the Root of Trust remains the
1. The token is valid (as elaborated elsewhere), same.
2. The Aggregation Layer has not forked,
3. The Aggregation Layer has not certified conflict- 3 Previously obtained Root of Trust is used to validate fu-
6
4 Non-deletion Proof 6. The proof is valid if the checks above passed.
A non-deletion proof is a cryptographic construction A valid proof demonstrates that, given authentic
that validates one round of operation of the append- roots ri−1 and ri , the keys in Bi corresponded to
only accumulator. empty leaves prior to the update, and that after the
We have the i-th batch of insertions Bi = update, the values in Bi were recorded at the posi-
(k1 , k2 , . . . , kj ), where k is an inserted item; all inser- tions defined by their respective keys, and there were
tions are applied within a single operational round. no other changes.
The root hash before the round is ri−1 , and after the Complete verification algorithm is presented as Al-
round is ri . The accumulator is implemented as a gorithm 2. Note that there are several assumptions:
Sparse Merkle Tree (SMT). that the batch is sorted by keys; and the proof is an
The non-deletion proof generation for batch Bi array of arrays of tuples, outer array divides siblings
works as follows: into depth layers and inner array is sorted by keys
(first element of tuple).
1. The new leaves in batch Bi are inserted into the Due to the sparseness of the SMT we can further
SMT. improve the encoding, for example, instead of check-
ing if a node’s sibling is the next item in layer’s nodes
2. For each newly inserted leaf, the sibling nodes on
or the next item in proof array or empty element oth-
the path from the leaf to the root are collected.
erwise, we just record a number–how many of the
Siblings present or computable from other leaves
next siblings are empty elements (frequent close to
in the batch are discarded. Siblings can be fur-
the leaves when SMT is sparsely populated); and
ther organized by dividing them into layers, for
same with siblings (frequent close to the root).
more efficient verification. We denote the set as
πi .
7
The proving system used is Groth16 [1], which is batch as zero (the value of empty leaf). The second
known for its small proof size. The proving time de- half computes the post-update root using the actual
pends on the depth of the SMT (logarithmic in its ca- values from the batch. The number of hashing units
pacity) and the maximum size of the insertion batch. in each half of the circuit is approximately O(kmax ·d).
Importantly, the proving effort does not depend on Each hashing unit takes its inputs either from the
the total capacity of the SMT, enabling fairly large outputs of the previous layer’s units or from the set
instantiations. of sibling nodes provided in the proof. The pre-
When the Consensus Layer verifies these succinct processing step encodes the positions of batch and
proofs, the Aggregation Layer operates trustlessly. proof elements into these control signals, which are
However, certain redundancy is still required to en- then supplied as part of the witness.
sure data availability of the SMT itself. Each hashing cell in the circuit, as depicted in Fig-
ure 5, is a template consisting of two input multiplex-
ers and one 2-to-1 compressing hash function.
6 Circuit-Based SNARK Defi- The MUX inputs for the leaf layer of the first half
nition are connected to a vector containing:
Due to the limited expressivity of an arithmetic cir- • The “empty” leaf value (0).
cuit (e.g., no data-dependent loops or real branch-
• All new leaves in the batch, which are mapped
ing), the entire computation flow must be fixed at
to ‘empty’ (0).
circuit-creation time. It is therefore helpful to pre-
process the inputs to create a fixed execution trace. • The “proof” or sibling hashes (πi ).
This pre-processing generates a “wiring” signal,
which is supplied as part of the witness. This sig- The MUX inputs for the leaf layer of the second
nal dictates the data flow between the hashing units half are connected to a vector containing:
within the circuit.
To preprocess the proof: • The “empty” leaf value (0).
1. The hash forest, which includes the proof’s sib- • The batch of new leaves (I).
ling nodes and the new batch leaves, is flattened. • The identical “proof” or sibling hashes (πi ).
2. The nodes are sorted first by layer (from leaves
The MUXes for internal layers are connected to a
to root) and then lexicographically within each
vector containing:
layer.
• The “empty” leaf value (0).
3. A wiring signal is generated to control the mul-
tiplexers (MUXes) at the input of each hashing • Output hashes from the previous layer’s cells.
unit in the circuit.
• The “proof” or sibling hashes (πi ).
Let the maximum batch size be kmax and the SMT
depth be d. Since the arithmetic circuit is static, Both halves’ MUXes are controlled by the same
it must be designed to accommodate the maximum wiring signal.
possible batch size, kmax .
The circuit has two halves, both controlled by the
6.1 Performance Indication
same wiring signal. It is critical to security that the
control signal and the proof are the same for both Initial benchmarks on a consumer laptop (Apple M1)
halves. The first half of the circuit computes the using the Poseidon hash function indicate a proving
pre-update root by treating all leaves in the insertion throughput of up to 25 transactions per second.
8
Figure 4: Circuit structure.
9
The privacy of the witness (the zero-knowledge ZK. There are attempts to create precompiles for ZK-
property) is not a requirement for this application. friendly hash functions5 , with limited real-world ef-
The primary goal is to achieve computational in- fect.
tegrity and succinctness. Therefore, while the un-
derlying technology is often referred to as “ZK”, we 7.3 More on ZK and Hash Functions
are using it as a Scalable Transparent ARgument of
Knowledge (STARK). Standardized cryptographic hash algorithms like
SHA-2 were optimized mostly for minimal physical
chip area, a design choice driven by NIST. Others,
7.1 zkVM Performance like the Blake family, were designed for fast execution
On a 10-core Apple M1 CPU, proving a 500- on CPUs. They all include numerous bitwise oper-
transaction batch using SHA-256 within the SP1 ations (e.g., rotations, XOR) that are silicon logic-
zkVM takes approximately 5 minutes. However, the native but are notoriously inefficient to prove in ZK.
SP1 framework is robust and designed for scalability, Proving such operations is expensive, because a full
supporting distributed prover networks, industrial- field element (e.g., a 254-bit value on the BN254
grade GPUs, proof chunking and recursion, and other curve) must be used to represent a single bit.6 ZK
advanced features to tackle larger problems with provers are most efficient with arithmetic operations
brute force. native to the underlying finite field, such as addition
and multiplication (and lookups on some ZK stacks).
Other operations must be implemented indirectly.
7.2 Optimization Ideas There are some newer cryptographic hash functions
The ZK proving performance is dominated by the specifically designed for ZK efficiency in mind. Func-
cryptographic hashing primitive used by the pro- tions like Poseidon and Poseidon2 are gaining accep-
gram. tance but are still relatively new. Some are better
At the time of writing, the SP1 zkVM4 offers pre- on large fields (e.g., Reinforced Concrete), some on
compiles for standard hash functions like SHA-256, smaller (e.g., Monolith) and depending on the proof
which accelerates their execution compared to a di- system’s lookup table support. Even newer and ex-
rect RISC-V implementation. The use of these pre- hibiting even higher performance examples are Grif-
compiles (also known as coprocessors or chips) can fin, Anemoi. Some, like GMiMC, are offering a com-
be observed in the prover’s output, which details promise with better silicon CPU performance.
the number of calls to each specialized circuit (e.g., A key advantage of these hashes is that they op-
SHA_EXTEND, SHA_COMPRESS). However, even with ac- erate directly on field elements, avoiding the costly
celeration, proving SHA-256 is computationally ex- translation from integer representations. The secu-
pensive. rity level is defined by the underlying field and in-
A possible optimization is to use “ZK-friendly” stantiation parameters. While some VMs, like the
hash functions. These functions are highly efficient Cairo VM used by Starknet, provide direct access to
when implemented directly in arithmetic circuits, field elements, they are often highly specialized for
where there is direct access to the native field ele- particular use cases, such as L2 rollups.
ments. Their performance advantage in a RISC-V
zkVM is more nuanced, as there is an overhead in 7.4 Performance Roadmap
translating between the VM’s 32-bit integer registers
The overall approach is sound: the proving time de-
and the underlying finite field elements as used by
pends on the size of the addition batch, and notably,
the prover. Operations like range-checking, which
are necessary to prevent overflows, are expensive in 5 [Link]
6 See e.g. [Link]
4 [Link] master/circuits/sha256/[Link]
10
it does not have linear relationship to the total capac- can be tentatively considered secure for this type of
ity of the data structure. The verification algorithm application. It offers an estimated 50× improvement
is tight. in proving performance compared to efficient stan-
To overcome the performance bottleneck, a ZK- dard hash function like Blake3.
friendly hash function is essential. The ideal proving
framework would provide direct access to the native
field elements of its arithmetization layer, a feature 8 Summary
not typically available in general-purpose zkVMs.
Execution trace generation must be highly efficient (a Zero-knowledge proof systems offer a powerful
criterion that excludes older frameworks like Cairo 0).method for creating succinct proofs of performing
The prover itself must be fast. State-of-the-art uses some computation, in our case, checking consistency
small prime fields (e.g., BabyBear, Mersenne-31) proofs of a distributed cryptographic data structure.
and FRI-based polynomial commitment schemes, like For use cases with small changesets, a simple hash-
Circle-STARKs [2]. Promising implementations are based proof, whose size is linear in the batch size, is
Plonky37 and STwo8 . Considering the need for matu- optimal. However, as batch sizes increase and band-
rity, modularity, and an open-source license, Plonky3 width becomes a constraint, the constant or near-
emerges as the strongest option. constant size proofs generated by ZK systems become
To utilize the Plonky3 framework, the verification more advantageous.
logic must be implemented as a custom AIR circuit
Different proof systems offer different trade-offs.
(Algebraic Intermediate Representation) rather than
The properties are: proving effort, necessity of
a general-purpose program.
trusted setup, generality of trusted setup, interactiv-
ity, proof recursion-friendliness, and of course prop-
7.5 Custom AIR Circuit erties like availability of tooling, maturity, trustwor-
thiness. Some, like STARKs, are relatively fast to
Extrapolating from benchmarks of similar computa-
prove but have fairly large proofs; and avoid unde-
tions 9 using the Plonky3 framework, Poseidon2 hash
sirable properties such as trusted setup. Others, like
function, and a small finite field, the projected per-
Groth16, produce small proofs but require more prov-
formance of such a stack on a 10-core CPU is approx-
ing effort and a circuit-specific trusted setup. For
imately 10 000 tx/s. The parameter “blowup factor”
more complex applications, hybrid approaches and
is 21 , resulting in a 1.7 MB proof. A more conserva-
proof recursion can be employed. Figure 6 illustrates
tive configuration with a blowup factor of 23 would
the proof size trade-off.
yield approximately 2500 tx/s with a 0.7 MB proof
and higher memory requirements for the prover.
These figures indicate that operating very large-
scale Aggregation Layer in a trustless manner is eco- References
nomically feasible.
We note that the Poseidon family of hash func- [1] Jens Groth. On the size of pairing-based
tions is relatively new and has undergone less crypto- non-interactive arguments. Cryptology ePrint
graphic analysis than traditional hash functions like Archive, Paper 2016/260, 2016.
the SHA-2 or SHA-3 families. However, among the
new class of ZK-friendly arithmetic hash functions, [2] Ulrich Haböck, David Levit, and Shahar Papini.
Poseidon has undergone the most public scrutiny and Circle STARKs. Cryptology ePrint Archive, Pa-
per 2024/278, 2024.
7 [Link]
8 [Link]
9 Experiment with iterative hashing and a hash-based sig- [3] Satoshi Nakamoto. Bitcoin: A peer-to-peer elec-
nature scheme [Link] tronic cash system. 2009.
11
hash-based consistency proof
Algorithm 2 Verification of non-deletion proof
function VerifyNonDeletion(π, ri−1 , ri , P )
▷ Proof π is a by-layer array of ...
Proof size
12