Simulation of Stochastic Blockchain Models: Workshop On Blockchain Dependability
Simulation of Stochastic Blockchain Models: Workshop On Blockchain Dependability
Abstract—This paper build the foundations of a simulation tool Section II compares the paper to a selection of related
for blockchain-based applications. It takes advantage of the huge works. In order to clarify our approach while staying as generic
expressiveness and extensibility of PyCATSHOO framework to as possible, a lightweight stochastic model of blockchain
deal with the important variability of blockchain implementations
and properties of interest. A simple stochastic model of generic are proposed in section III (only the blocks appending and
blockchain-style distributed consensus system and associated broadcasting aspects are considered). The model is thereafter
performance indicators are proposed (performance in terms implemented into PyCATSHOO framework in section IV. The
of consistency and ability to discard double-spending attacks). concepts of blockchain protocol consistency and ability to
Monte Carlo simulations are applied to assess the indicators and prevent from double-spending attack are therefore interpreted
determine their sensitivity to the variation of input parameters.
Index Terms—Blockchain, Markov process, stochastic au- into our stochastic framework and some related probabilistic
tomata, Monte Carlo simulation, consistency, double-spending indicators are assessed using PyCATSHOO in section V. Fi-
attack, PyCATSHOO nally, section VI concludes this paper and proposes directions
for future works.
I. I NTRODUCTION
II. R ELATED WORKS
Blockchain technology recently benefits from a widespread
interest because of its huge potential for securing decentralized Current state of the art on blockchain modelling can be
applications. In practice, it refers to an important range of split into two categories: deterministic and stochastic. The first
implementations (Bitcoin [1], Ethereum [2], Hyperledger [3], one is generally associated to formal proof purposes, either on
etc.) sharing a common purpose, basically: to register data blockchain protocol itself or on smart contract built on top of
on a cryptographic ledger written by a peer-to-peer network it.
performing a protocol to ensure a consensus (an edition is • [5] proposes a Coq-aided proven ”agnostic” blockchain
symbolized by a block chainage). Those implementations are protocol (currently the consistency is ensured only if the
characterized by a set of primitives: consensus protocol, type network topology is a clique, but stronger guarantees are
of registered data, fork resolution rule, etc. Each of them have targeted by authors for future works).
advantages and drawbacks and for a given use case, it is • [6] exploits infinite Mealy machines’ expressiveness to
often non-obvious to determine a priori what implementation capture the blockchain construction process in a generic
will be the most relevant. A tool to assess dependability and way and describes properties a protocol should possess
performance of a solution soon in the design process would to build a consistent blockchain, with two qualities of
be a major asset to assist the engineering of blockchain-based the criteria: strong and eventual (this contribution will be
applications. The tool should be based on a generic blockchain discussed in subsection V-A).
model, that can be used in the raw to evaluate properties across • [7] proposes a communicating automata model of the
implementations or be customized to fit with a particular one. triptych: blockchain construction, smart contract, users
This paper introduces PyCATSHOO framework to build the behaviour (honest and hacker) and exploits the statis-
foundations of such a tool. PyCATSHOO is a Python library tical model-checker BIP to quantify the risk for an
shaped to build hybrid stochastic automaton developed by EDF implementation to not satisfy its specification ( [8] pro-
R&D. It comes with a Monte Carlo simulation [4] engine to poses a NuSMV model with the same idea of three-
assess probabilistic attributes of the model, in an easier way fold behaviour and the same purpose of model-checking
than with a tedious mathematical analysis of the model. In although differently developed).
particular, this paper shows how well-known -while difficult to Contributions on stochastic modelling of blockchain are
prove- results on blockchain consistency and ability to discard mostly shaped to compute probabilistic attributes analytically,
a double-spending attack can be computed by easy to set up even if the last referenced paper below validates its model by
Monte Carlo simulations. a simulation.
This work has been funded by the EDF R&D project DURIN (Dependable • The original Nakamoto’s paper introducing Bitcoin [1]
Uses of Reliable blockchaIN). provides the first results on assessing the risk that an
attacker could win the race against honest peers in a • E ⊂ B 2 such as it exists a unique path from b0 to any
blockchain. other block.
• [9] shows the security of Bitcoin protocol, which is Formally, ∀b ∈ B\{b0 }, ∃n ∈ N∗ , ∃(b1 , ..., bn ) ∈ B n |
reduced to two properties: persistence and liveness. Actu- bn = b ∧ ∀i ∈ [[1, n]], (bi−1 , bi ) ∈ E
ally these properties are based on probabilistic indicators, n is called the depth of b (and its associated path).
namely the common prefix between blockchain copies
and the chain quality that measures the influence of To expand a blocktree bt = hB, Ei from a block b ∈ B
adversarial peers on the blockchain. This approach has with a new block b0 ∈ / B results in the blocktree bt0 = hB ∪
0 0
shown its genericity in [10], where it is applied to extend {b }, E ∪ {b, b }i. The expansion operation is then a partial
the proof on the more generic protocol GHOST [11] mapping from BT × B 2 to BT .
(adopted by several blockchains including Ethereum). Moreover a blockchain protocol defines a total order over
• [12] models the evolution of partition between honest and
B, preserving E, i.e (b, b0 ) ∈ E =⇒ b0 b. This order is the
adversary nodes along the blockchain as a 1-dimensional cornerstone of the protocol since the definition of the so-called
random walk. This model allows them to demonstrate in- blockchain is built on it.
teresting results on safety of several blockchains, namely Definition 2. Given a blocktree bt = hB, Ei and a total order
Bitcoin-NG, PeerCensus and BizCoin (this contribution over B, the blockchain is the unique path from the genesis
will be discussed in subsection V-B). block b0 to the last block bl = max B.
• Finally, [13] models the mining process by an inho-
mogeneous Poisson process. It shows that when the
hash rate increases exponentially, the difficulty control B. Distributed handling of the blockchain
implemented by Bitcoin protocol works so that the block A blockchain is built by a network of processes applying
rate -i.e. the mean time between the mining of two sequentially the expansion operation starting from the inital
consecutive blocks- converges to a constant (10 minutes blocktree h{b0 }, ∅i. In practice, every process refers to its own
in practice). Then it proposes an improvement of such view of the blocktree and strives to build a consensus with each
control to speed up the convergence. Its analytic results other on the shared blockchain while increasing its depth. For
are confirmed by simulation and confronted with actual this purpose, they continuously try to ”build” valid blocks.
Bitcoin and Namecoin histories. When such valid block is found by a process, it expands its
While the above referenced contributions give complemen- local blocktree from its last block (i.e. the last block of its own
tary keys to build a blockchain generic model and define view of the blockchain) and broadcasts it to the others. When
meaningful performance attributes, none of them propose a a process receives a new block from another one, it updates its
framework to ease the tuning of the model and exploit Monte local blocktree expanding it with the new block. In practice,
Carlo simulation to assess attributes, which is a convenient forks may actually be observed due to blocks broadcasting
way when models become complex1 . delay.
Building a new valid block is an operation that may take
III. B LOCKCHAIN STOCHASTIC MODEL many forms depending on protocols. It consists at least to
A basic stochastic model is proposed in this section to provide a proof that the process is legitimate to append a
capture the block creation and broadcasting process. To stay block. The two widely considered kind of such proof are
as generic as possible, blocks are in this model abstract called Proof-of-Work (PoW) and Proof-of-Stake (PoS). In case
objects that should be elicited to capture the underlying ledger of a PoW-based protocol, the process has to solve a hard
evolution (what implies to define the type of registered data computational problem (often a constrained hashing), whereas
and the registering mechanisms). for a PoS-based protocol, it has to show that it is deeply
involved in the blockchain (in practice a proof that it holds a
A. Blocktree data structure lot of tokens). We propose to model the chance for a process i
A blockchain is actually a particular branch of a rooted to build a valid block by a unique parameter mi ∈ R+ , called
tree, i.e. a directed acyclic graph such as all nodes have a the merit (which symbolizes for example the process hashrate
unique father except one, called the root, which has none. In in case of PoW or the amount of its balance in case of PoS).
blockchain jargon, the root is called the genesis block. Calling Then we introduce an abstract oracle that randomly chooses
B the set of blocks, we can formally define a blocktree. a process to build each new block according to their merit (a
similar idea can be found in [6]). A blockchain protocol is
Definition 1. A blocktree bt ∈ BT is a 2-tuple hB, Ei, where: usually designed in order that a block is appended regularly
2
• B ⊂ B, is a finite non-empty set of valid blocks, with a constant mean block time tb ∈ R+ (e.g. 10 minutes
including at least one element b0 (the genesis block). for Bitcoin, 12 seconds for Ethereum). We propose then a
Markovian model for the oracle behaviour.
1 Note that this list of article is a selection of contributions that have most
inspire this work but do not target any completeness. Definition 3. The oracle behaves as a continuous Markov
2 The definition of a block’s validity comes with the elicitation of a block. process which infinitely selects a new process i among a set
of processes P to build a new valid block, according to its according to its own clock what may provoke local
mi 1 inconsistencies -what should be fixed by the protocol-
normalized merit m̂i = X , with a rate λ = .
mj tb and bias in network transit times. But we argue that at
j∈P this modelling level, this phenomenon can be neglected
The delay from a block creation by a process i to its and abstracted into the mean network transit time. More-
reception by a process j depends on several network-related over, this assumption results in lowering significantly
factors (like topology, bandwidth, instantaneous load). As a the simulation time what is essential to perform Monte
first approximation, we can abstract these factors introducing Carlo simulation (for which a lot of histories has to be
a global mean network transit time tn ∈ R+ . But we can be a simulated).
• The merit of each process is assumed known and fixed
bit more precise to consider network asymmetries, by defining
a mean network transit time tn,i for each process i. With during the scenario. To consider variable merits with
this refinement, the mean time to transfer a block between uncertainties on their values constitutes a way to refine
tn,i + tn,j the model.
two processes i and j is . Hence the last definition
2 An advanced mathematical analysis of the model would al-
snippet of our model can be stated.
low to determine the bias it introduces in comparison to a given
Definition 4. The reception by a process i of a block appended blockchain protocol, but it is not in the scope of this paper. We
2 will see in section V that this lightweight model is sufficient
by a process j 6= i occurs with a rate µi,j = .
tn,i + tn,j to rediscover without pain well known results, what can either
Figure 1 represents by intention the system of Markov be observed on real implemented protocols execution (what
chains that can be built from Definition 3 and 4. spend a lot of time), or be obtained by a rigorous mathematical
analysis. But reasoning on our model to solve the dependence
between the parameters and indicators on blocktree shape and
more generally, on protocol consistency is a tedious work
(and even more so on more complex models). Next sections
introduces PyCATSHOO framework and shows how to take
advantage of it to assess these indicators.
IV. P Y CATSHOO IMPLEMENTATION
PyCATSHOO3 is the combination of a Python library to
describe distributed hybrid stochastic automata and a tool to
perform Monte Carlo analysis on models. It is a convenient
Fig. 1. Markov chain system modelling a blockchain protocol
approach to perform probabilistic assessments on systems that
combines both discrete and continuous behaviours4 . The main
C. Discussion exploitation of PyCATSHOO at EDF is for model based safety
The proposed model makes several simplifying assump- analysis of power plants (discrete and continuous behaviours
tions, what are discussed hereafter: are caused by respectively failure/repair events of components
• In practice, for protocols that relying on particular con- and evolution of physical variables such as pressure and
sensus mechanisms (like Proof of Work), all distributed temperature). In this section, some principles of PyCATSHOO
processes try to build a valid block in parallel. It is paradigm are recalled (it can also be found in [14] and [15]),
possible that several processes succeed this task in a then the implementation keys of the model defined in section
very short period, what cause a fork between concurrent III in this framework are provided.
branches, until a new block is chained after one of them A. Reminder on PyCATSHOO principles
and is received by all other processes. Since our oracle
selects a process with a probabilistic time, this scenario A PyCATSHOO model is a system of components that
is still possible although less probable. The oracle could communicates through message boxes. Each component is
be refined to increase the chance for several processes to defined by a 4-tuple hV, B, A, Ri, where:
append a new block in a short period but it would be • V = I ∪ E is a set of variables partitioned into a subset
section that PyCATSHOO allows to declare deterministic input ports through which internal and external variables
delayed transitions as well as stochastic ones). are respectively exported and imported;
• All the distributed processes share a global clock for 3 PyCATSHOO is freely accessible at http://pycatshoo.org and should be
timestamping the blocks creation through the oracle ab- open source soon.
straction. In reality, each process timestamps its blocks 4 Note that the model introduced in this paper is purely discrete.
• A is a set of stochastic automata; The formal semantics of a PyCATSHOO model will not be
• R = D ∪ C is a set of evolution rules of the component detailed in this paper. Let us only state the general idea of what
state variables partitioned into a subset of rules that are the simulator engine provides: a finite history of variables’
applied on the occurrence of discrete events and the assignments, randomly generated according to stochastic and
subset of rules determining the continuous dynamics. deterministic evolution of the model which is specified by
A stochastic automaton a ∈ A being defined by a 3-tuple components’ automata and rules. Then this engine can be
hS, s0 , T i, where: exploited to assess probabilistic indicators on the model (e.g.
the mean value of a variable) taking advantage of Monte Carlo
• S is a set of states; simulations.
• s0 ∈ S is the initial state; The main benefit to exploit PyCATSHOO framework is that
• T = Ts ∪Td is a set of transitions t which is itself defined it inherits from the expressiveness of Python itself. Indeed, al-
by a 4-tuple hs, g, d, pi, where: though the types of PyCATSHOO variables are basic (boolean,
– s ∈ S is the origin state; integer, float, string), convenient intermediate objects can be
– g ∈ V −→ {T rue, F alse} is a guard built on the created and manipulated through (discrete) evolution rules
variables determining the validation condition of the alongside the proper PyCATSHOO variables to ease the model
transition; specification.
– d ∈ R+ is a parameter used to generate a delay
B. PyCATSHOO model of blockchain
before firing a validated transition. If the transition
is of kind stochastic (t ∈ Ts ), the delay is randomly The model described in section III can be implemented into
chosen according to an exponential law of parameter PyCATSHOO framework, defining three components, namely:
d, whereas if the transition is of kind temporized (t ∈ the Process, the Oracle and the Blocktree. An overview of
Td ), the delay is simply the value of the parameter these components is depicted on Figure 2.
d. Note that d can be specified using variables in A block is implemented as a pure Python object, fully
V such that its value may change with the model determined by a (unique) hash, its father block (None for
evolution. the genesis block), a timestamp and the author process
X
– p ∈ S −→ [0, 1]| p(s) = 1 is a probabilistic that build it (its depth is simply the depth of its father
s∈S incremented). A Python dictionary stores all blocks and is
distribution on state space to select the destination used to retrieve a block instance given its hash. We define
state (if a single destination state s ∈ S is possible, a basic order for blocks (this order is total since two blocks
then p(s) = 1). Once again, p can be specified using cannot be built at the same instant):
variables in V .
b1 .depth > b2 .depth
An evolution rule r ∈ C (determining the continuous
b1 b2 ⇐⇒ ∨ b1 .depth = b2 .depth
dynamics of variables) are specified as ordinary differential ∧ b1 .timestamp > b2 .timestamp
(or not) equations, whose resolution for a given simulation
instant is handled by the simulator engine. The other kind of The (unique) Blocktree component has always a perfect
evolution rule r ∈ D (determining the discrete dynamics of knowledge of already appended blocks. For a process (identi-
variables) are specified as functions called sensitive methods fied by a unique address), a block can be either known (yet
because they are executed when a specified event occurs: received) or pending (not yet received). The block creation
is scheduled by the oracle firing the transition from state
• when the simulation start (used to specify the initial waiting to state tokenGenerated (parameter λ is the
assignation of the variables); inverse of meanBlockTime as stated in section III). When
• when a transition is fired; this transition is fired, the following instantaneous sequence is
• when an automaton state is left or entered (the sequential performed:
order for the three events associated to the firing of a 1) the sensitive method selectProcess randomly se-
transition t from state s1 to state s2 is quite intuitive: lects one of the process according to its merit and
leaving s1 → firing t → entering s2 ); assign the variable tokenHolder with its address;
• when a referenced external variable moves (used to 2) tokenGenerated becomes True5 what triggers the
propagate the effects of an event occuring in an other transition from working to claimTkn for each pro-
component). cess;
Finally, components can be connected through their message 3) the transition from tokenGenerated to waiting is
boxes. A connexion between x and y through their respective fired;
message boxes bx and by is valid if and only if any imported 4) the transitions from claimTkn to tknHeld then to
variables by one is an exported variables by the other. For- working are fired only for the selected process;
mally:
5 To declare an automaton state as a message box port is a convenient
bx ∩ Ex = by ∩ Iy syntactic way to define a boolean variable which is True whenever the state
by ∩ Ey = bx ∩ Ix is active.
Fig. 2. Overview of the PyCATSHOO model of a blockchain generic protocol
5) the sensitive method consumeToken (of the token The code size is only around 500 lines in Python language6 .
holder) creates a new instance of Block (whose the As said in last subsection, our model can be easily enhanced
timestamp is the current simulation time and the father in many ways, for instance to fit a particular blockchain
is the last block known by the process), appends it to paradigm, to consider faulty behaviour of processes or to take
knownBlocks and updates lastBlock with its hash; into account variable probabilistic input parameters.
6) the modification of a variable lastBlock has the ef- Next section shows how the PyCATSHOO Monte-Carlo
fect to call the method appendBlock of the Blocktree simulation engine can be exploited to assess performance
component. The method append the new block to its list indicators of a blockchain protocol in terms of consistency
Blocks, then updates the variable appendedBlock; and ability to discard a double-spending attack.
7) the guard of the transition from claimTkn to
working is now validated for all remaining pro- V. A SSESSMENT OF BLOCKCHAIN ATTRIBUTES
cesses. The transition is fired and the associated method
A. Blockchain consistency assessment
newPendingBlock append the new block to their list
pendingBlock; The consistency is a property of great interest to qualify the
performance of a blockchain protocol. Informally, a protocol
Reception of pending blocks by processes is scheduled by is consistent if its processors succeed to build a consensus
a set of automata (parameter µ is computed from the variables on the blockchain. Several formal definitions are proposed in
meanTransitTime of the receiver and the block author as literature. In particular, [6] defines two kind of consistency
stated in section III). Because several blocks can be in recep- criteria, built on the prefix relation (a chain c1 prefixes another
tion in parallel, a block may arrived before its father, then the chain c2 if and only if the last block of c1 is an ancestor of
guard of the transition between arrived and idle ensures the last block of c2 ):
that a block is actually transferred from pendingBlocks • the strong consistency holds when it exists a prefix rela-
to knownBlocks after its father. This transfer operation tion between all processes’ blockchains. In other words
is performed by the sensitive method receive which also processes never fork.
computes the new blockchain to update the lastBlock
variable. 6 The code is available at http://pycatshoo.org/Model Samples.html
TABLE I
A SYMPTOTIC VALUES OF THE CONSISTENCY INDICATORS FOR SEVERAL PARAMETERS ASSIGNMENTS
(In each case, upper, middle and bottom values are respectively the consensus probability, the consistency rate and the worst process delay.)
n
2 3 4 6 10 20 40 60 100
r
0.913 0.868 0.839 0.803 0.761 0.702 0.657 0.635 0.598
0.1 0.955 0.938 0.930 0.922 0.914 0.909 0.908 0.908 0.907
0.094 0.143 0.180 0.220 0.271 0.347 0.415 0.442 0.505
0.837 0.762 0.718 0.660 0.587 0.505 0.455 0.412 0.368
0.2 0.912 0.884 0.870 0.858 0.844 0.832 0.831 0.830 0.832
0.189 0.279 0.338 0.418 0.533 0.671 0.771 0.853 0.945
0.686 0.559 0.479 0.391 0.304 0.231 0.180 0.168 0.088
0.5 0.823 0.766 0.735 0.705 0.683 0.663 0.656 0.655 0.646
0.424 0.614 0.754 0.918 1.080 1.264 1.373 1.406 2.111
0.602 0.453 0.369 0.280 0.192 0.118 0.073 0.054 0.037
0.7 0.769 0.698 0.665 0.627 0.598 0.579 0.572 0.568 0.559
0.607 0.895 1.070 1.311 1.617 1.973 2.316 2.500 2.761
0.515 0.347 0.264 0.173 0.109 0.054 0.026 0.017 0.009
0.99 0.715 0.625 0.586 0.538 0.513 0.484 0.475 0.467 0.463
0.844 1.238 1.476 1.810 2.169 2.615 3.021 3.279 3.554
Fig. 3. Interpolation of asymptotic values of the consistency indicators for 2 ≤ n ≤ 100 and 0.1 ≤ r < 1 (red is a better consistency than blue)
• the eventual consistency holds when the greatest common To ease the results understanding, we assume a perfect
prefix between all processes’ blockchains always eventu- symmetry of the network and a perfect fairness between
ally grows. processes, formally:
2 tn,i = tn,j
For our model, it is clear that the strong consistency (i.e. ∀(i, j) ∈ P ,
mi = mj
the fork probability equals 0) is guaranteed only if the mean
network transit time tn approaches 0. On the other hand, the
perpetual eventual growth of the greatest common prefix is still
possible while 0 < tn < tb , although more or less frequent de-
pending on parameters. Intuitively, the lower the number n of
tn
processes and the ratio r = are, the better the consistency
tb
is. To validate this intuition while refining the characterization
of consistency, we introduce three indicators, which can be
seen as three complementary metrics of consistency (next we
call the absolute blockchain the most advanced among all
locally viewed blockchains according to the order ):