Lecture 9. The Tendermint Protocol.

HW:

Problem 1 (25 points) Recall the Tendermint protocol for state machine replication.

(a) (10 points) Prove that if f ≥ n/3, the protocol no longer satisfies consistency. That is, give a specific strategy for the adversary (both the behavior of the Byzantine nodes and the message delivery choices) that leads to two honest nodes writing different blocks (at the same height) to their local histories. [If it’s helpful for this or the next part, you can assume that the adversary also gets to pick the fixed round-robin order in which the protocol rotates leaders.]

(b) (15 points) In our liveness proof for the Tendermint protocol, we showed that, provided f < n/3, a transaction known to all honest nodes will eventually appear in all honest nodes’ local histories. Show that the conclusion no longer holds if we assume only that a transaction is known to a single honest node. That is, give a specific strategy for the adversary (with f < n/3) that censors the transactions known to a given honest node indefinitely.

Consensus with Partial Synchrony

Recall: Partially Synchronous model.

|t=0|____Asynch phase____|t=GST, unknown|____Sync phase, ∆ msg delay ______

Goals: (for SMR)

  • consistency (always)
  • liveness (eventually, after GST)

Lecture 8: if f ≥ n/3, no protocol satisfies these goals (even with PKI).

Synchrony
Partial synchrony
Asynchrony
Permissioned
✅ PKI, any f<n ⇒ BB protocol ❌ no PKI, f≥n/3 ⇒ no BB protocol
✅ ? ❌ f≥n/3 ⇒ no BA protocol
❌ f=1 ⇒ no BA protocol

Tendermint: [Buchman-Kwon-Milosevic 2018] In partial synchrony, there exists an SMR protocol that, when f < n/3, satisfies consistency (always) + liveness (eventually).

Tendermint: High-Level Ideas

(Remark: with these consensus protocols the devil is usually in details. Intuition is frequently misleading in distributed systems, so a formal proof is always good to require.)

Idea #1: iterated single-shot consensus. (output of each = ordered list of txs, “block”)

  • each node maintains its own “height” (latest block it knows) [in asynchronous phase, might be different for different nodes]
  • every single message is annotated with “what is the next block that I am trying to figure out”.
  • if a node is working on block 9, it is ignoring all messages about other blocks (with a tiny exception tbd later)

Idea #2: for a fixed height, keep proposing + voting till agreement (can do it since GST will come at some point).

  • BB style, there will be a proposer and voters. Proposer will be rotating.

Idea #3: two stages of voting. (This is the key innovation)

  • Why one stage is not enough? Because different nodes may see different voting outcomes, because (1) Byzantine nodes; (2) Asynchrony.
  • Node will have - restart outcome (when the voting failed from their viewpoint) - commit outcome (when they are convinced) - intermidiate “hedging” outcome (they are convinced but not 100%!)

Quorum Certificates (QCs)

Preliminaries:

  • Assume PKI, all messages signed by sender + pub keys distributed before the protocol (not needed in general, but in Tendermint needed)
  • Round = interval of 4∆ timesteps (shared clock, ∆ known ⇒ all nodes know which round is now)
  • Use rotating leaders (one per round) [easy due to shared clock + permissioned setting]

Definition: a quorum certificate (QC) is a batch of ≥ signed votes for some block B (at some height, at some round, at some stage (1 or 2) of that round).

Lemma: any two QCs overlap in at least one honest node. [simply because they overlap in ≥n/3 nodes, and we have a bound f<n/3]

Proof: overlap is , qed.

Corollary: any two (⇒all) QCs for some [block # + round + stage] must agree on the block.

This is because each honest node votes only once in one referendum [block # + round + stage]; see later for pseudo-code.

Plan: - each node maintains a pair of local variables , where the second is a QC for the first - initially is null, and = all unexecuted txs that knows about. - periodically updates to most recent (according to rounds/stages) block-QC pair it’s heard about - also save for future use any QCs for future blocks

The Tendermint Protocol (in pseudo-code)

Time is divided into rounds (4∆ timestamps): _._._._|_._._._|_._._._|_._._._|._._._|_._._._|_._._._|_._._._|_._._._|_._._._

Fix a height (e.g. block #9), and a round r with leader . The round r is divided into 4 phases, and starts exactly at time 4∆r (since one round = 4∆).

Summary of the procedure
Summary of the procedure

t=4∆r: (phase 1)

  • updates to the most recent QC known, proposes to all other nodes (including itself)
  • Message looks like this: (round r, height (block #9), , )

t=4∆r+∆: (phase 2)

  • If node receives from (it might not if msg is delayed)

and

  • If is not older (in terms of rounds/stages) than

then

  • Broadcast first-stage vote for : (including itself) (round r, height (block #9), vote fore “yes”, )
  • Update
  • Broadcast

t=4∆r+2∆: (phase 3)

  • If node receives supermajority, that is round-r stage-1 votes for (counting itself and the leader node )

then

  • Update newly votes received, (cause it knows that this is most recent QC)
  • Broadcast second-stage vote for
  • Broadcast

t=4∆r+3∆: (phase 4)

  • If node receives round-r stage-2 votes for

then

  • Update = this QC,
  • Commit to local history (because the block survived two stages of voting!)
  • Broadcast
  • Increment
  • re-initialize: is null, and = all unexecuted txs that knows about.

t=4∆r+4∆: (just before the start of next round r+1)

  • If received in the background a stage-2 QC for block # supporting a block B

then

  • commit B to a local history, increment

(repeat this procedure if possible)

In the backbground (at all times):

  • Store all QCs received for future blocks , , …

From the notes:

image

Summary:

image

Tendermint: Proof of Consistency

Theorem: Tendermint satisfies SMR consistency (for a given block #, all honest nodes commit the same block).

Proof: Fix a height h (e.g., block #9).

We need to prove that there cannot be two QCs for block #9, stage-2, some round. If the rounds of these two QCs are the same then we are done by the overlap lemma!

Let r = first round in which [>n/3 honest nodes = set S] cast stage-2 votes for same block . This is of course a prerequisite for a creation of a stage-2 QC (cause >2n/3 voted, <n/3 Byzantine ⇒ >n/3 honest voted).

Intuition: We want to argue that stage-2 QCs for block #9 can only be for block . These [>n/3 honest nodes = set S] simply “lock-in” onto their vote for — they are never going to change it in stage-1 because of the line 6 in the pseudo-code pic above, and therefore there will be never >2n/3 votes in stage 2 for a block ≠ .

Formally: (by induction)

  • At the end of round r: (i) for all [by pseudo-code: on round-r 3rd phase >n/3 casted votes for ⇒ on round-r 4th phase there cannot be QCs for other blocks. There cannot be also QCs in the background in the past, because round r is the “first” such round by definition] (ii) from round-r stage-1 or later [obvious from the pseudo-code] (iii) all QCs for other blocks are from round r-1 or earlier [semi-obvious from the pseudo-code]
  • In round r+1 no nodes from S change their mind: (i) + (ii) + (iii) all hold! [(iii) in round-r ⇒ leader cannot propose an earlier QC for a different block ⇒ >n/3 nodes in round-r+1 stage-1 don’t update → don’t vote ⇒ no QC for a different block can be formed in round-r+1 stage-2, qed]
  • In the future rounds: same.

[…see notes for more details…]

qed.

Tendermint: Proof of Liveness

Claim: Tendermint satisfies SMR liveness (eventually).

(Our SMR liveness property is going to be weaker than the one from Lecture 4: Old, strong livenes: every tx submitted to one honest node gets included. New, weaker liveness: every tx submitted to all honest nodes gets included. This is not a big deal, since honest nodes can communicate via some gossib protocol and share valid txs between each other.)

Proof: Consider a tx T known to all honest nodes.

Fast forwards to a pair and of consecutive rounds after GST+∆ with honest leaders (this exsists, since f<n/3)

Lemma: at start of round , every honest node is working on block # h or h+1. [Roughly, this is because after commiting blocks, honest blocks broadcast stage-2 QC for that block, and since we are post-GST, those broadcasts do arrive to other honest nodes!]

Proof: […see notes…]

Definition: a round is clean if (i) post-GST; (ii) honest leader; (iii) all honest nodes working on same block #; (iv) after update in 1st phase, leader’s QC at least as recent as that of an honest node

Lemma: clean round ⇒ all honest nodes commit the block proposed by the leader. [proof is by inspecting pseudo-code + remembering we are in post-GST…]

Case 1: all honest nodes start round r working on block # h+1. ⇒ round is clean, commits block including T

Case 2: all honest nodes start round r working on block # h. ⇒ round is clean, commits some block ⇒ round is clean, commits block including T

Case 3: the leader is behind. […]

Case 4: the leader is ahead. […]

Can we do better?

  • Can’t increase # of Byzantine nodes (without compromising elsewhere)
  • Can’t relax partial synchrony to asynchrony
  • Can’t have both liveness and safety before GST

Alternative trade-offs:

  • Longest chain consensus favors liveness over safety! (see next Lecture)

Same guarantees (fault-tolerance, consistency, eventual liveness) but better performance (smaller communication complexity, fewer rounds, faster recovery time post-GST, etc):

  • see HotStuff (Facebook Diem)
  • Casper FFG (Ethereum uses now)