Lecture 5. The SMR and BB consensus problems and the Dolev-Strong protocol.

Resources:
HW:

Having covered digital signatures, we now understand why we can trust that each tx in a blockchain comes from the particular public address — simply because it was signed by a private key associated to that public address. (Remember that public address ~= public key)

We now start addressing another core component of blockchains: the consensus mechanism. So far we have covered the basics of PoW and PoS without much details. We now wish to build a certain framework around consensus mechanisms, so that we can reason about them mathematically. We start with more basic consensus protocols, in the permissioned setting, meaningn that the set of nodes is known in advance. [PoW and PoS are quite advanced consensus protocols in the permissionless setting, meaning anyone can become a node/miner/validator]

Permanent assumptions from now on:

  1. Internet exists, and serves as a comunication layer for nodes.
  2. Cryptography exists:
    1. Hash functions
    2. Digital signatures

The SMR problem

State Machine Replication in 80’s is concerned with a problem of keeping several data centers in sync, which itself is motivated by the need for backups. […]

It is a really interesting fact that the problem of decentralizations in blockchains from 2008 is extremely similar to the SMR problem.

SMR problem [“data centers trying to stay in sync”]
  • Clients submit txs to nodes (data centers)
  • Each node maintains local history (append only data structure)

Goal: a protocol, which is an event-driven code to be run by nodes, that satisfies:

  1. Consistency (or safety): all nodes agree on their histories [for each pair of nodes history of one is a prefix of the other; in other words, lagging is okay, but different ordering of txs is not]
  2. Liveness: every valid submitted tx is eventually added to all nodes’ history.
image

Plan: introduce strong assumptions, solve the SMR problem, and then relax assumptions one-by-one and see whether the SMR problem is still solvable.

Assumptions

  1. Permissioned setting
    • Each node knows the set of all nodes: {1,2,…,n}

It makes sense for SMR, but not for blockchains. Nonetheless, it is highly useful to study and extrapolate from here.

  1. Public Key Infranstructure (PKI)
    • All nodes have public-private key pairs
    • Each node knows public keys of all other nodes (this is an example of a trusted setup assumption)

We will not be relaxing this condition that much. Note that Bitcoin and initial Ethereum don’t have this assumption.

  1. Synchronous setting
    • Existence of a shared global clock 1___2___3___4___5___6___7___→
    • Bounded message delays (if sent at time t, message arrives by t+1; critical assumption that we will want to relax)

Rmk. This assumption obviously fails IRL for network outages and DoS attacks. Later we will see that relaxing synchronous assumption (asynchronous setting) breaks either liveness or safety — thus, when evaluating blockchains, one of the key questions is what happens in case of prolonged network outage. Eventually, we will work in a so-called partially synchronous setting.

  1. All honest nodes
    • All nodes run the intended protocol (no bugs, no downtime, no malicious behavior)

Ridiculous assumption even for SMR, will start relaxing very soon.

Solving SMR via simple the “rotating leaders” protocol

If nodes don’t communicate, then the protocol fails to solve the SMR problem if client don’t send txs to all nodes ⇒ nodes need to communcate.

Coordination via rotating leaders

  • Node kk is a leader at times k, n+k, 2n+k, …
  • All nodes know this, since we are in permissioned setting
  • Leader node at time t does broadcasting:
    1. Collects all not-yet-included txs, and orders them arbitrarily
    2. Sends the ordered list of txs to every other node
  • Since we are in synchronous setting, all nodes have the new list by time t+1, and they append them.

Proposition. This protocol satisfies conssitency and liveness, under all 4 assumptions in the previous section.

Proof: […]

Introducing Faulty/Byzantine nodes

  • Honest node = never deviates from the protocol (intentionally or not intentionally)
  • Falty node = not honest node

Liveness and consistency properties have to be tweaked: every valid submitted tx is eventually added to all honest nodes’ history, and histories have to agree only for honest nodes.

Types of faults:

  • Crash faults (hardware errors in SMR)
  • Breaks the rotating leaders protocol if crash happens in the middle of leader’s broadcast, in which case consistency fails. Otherwise, protocol is fine, if liveness goal is tweaked properly.

  • Omission faults (network outage in SMR)
  • Seriously breaks the rotating leaders protocol, since leader can send txs only to some nodes.

  • Byzantine faults = arbitrary/malicious deviations! (software error in SMR)
  • Obviously breaks the rotating leaders protocol.

We now relax [Assumption 4], by allowing f Byzantine node (0<f<n)

The Byzantine Broadcast Problem

Rotating leaders protocol works if f=0, but fails if f≥1

Idea: keep rotating leaders, but add a cross-checking subroutine

Byzantine Broadcast problem. - One node is a sender, others are non-senders - Sender has a private input vVv^* \in V (for us this is an ordered list of txs) - The goal is, as before, a certain protocol. Desired Properties of the Byzantine Broadcast protocol: 1. Termination. Every honest node ii eventually halts with some output viVv_i \in V, node’s best guess for what vv^* is. 2. Agreement. All honest nodes halt with the same output. (~ safety property) 3. Validity. If the sender is an honest node, then the common output of the honest nodes is the private input vv^* of the sender. (~ liveness property)

Rmk 1. (1&2) or (1&3) are easily achieved. As before, (1&2&3) is what’s difficult (both safety and liveness!)

Rmk 2. In BB, there is one output ⇒ BB is a single-shot consensus problem, in contrast to SMR, which is a multi-shot consensus proble,.

SMR reduces to Byzantine Broadcast

Assumptions: synchronos and permissioned setting (PKI also, but not important here)

Given: a protocol π\pi for the Byzantine Broadcast problem, which with ≤f Byzantine nodes satisfies Termination+Validity+Agreement in at most T time steps.

Reduction: at each time step 0, T, 2T, … :

  1. Define a leader using round-robin ordering (0→ node 1, T→ node 2, 2T→ node 3, …)
  2. Leader assembles not-yet-included txs into an ordered list LL^*
  3. Invokes subroutine π\pi with leader = sender and v=Lv^*=L^*
  4. When π\pi terminates, every node ii appends output Li=viL_i=v_i to its local history

Theorem. SMR protocol above satisfies consistency (restricted to honest nodes) and liveness (restricted to honest nodes).

Proof: […]