Lecture 9. The Tendermint Protocol.

Resources:
HW:

Consensus with Partial Synchrony

Recall: Partially Synchronous model.

|t=0|____Asynch phase____|t=GST, unknown|____Sync phase, ∆ msg delay ______

Goals: (for SMR)

  • consistency (always)
  • liveness (eventually, after GST)

Lecture 8: if f ≥ n/3, no protocol satisfies these goals (even with PKI).

Synchrony
Partial synchrony
Asynchrony
Permissioned
✅ PKI, any f<n ⇒ BB protocol ❌ no PKI, f≥n/3 ⇒ no BB protocol
✅ ? ❌ f≥n/3 ⇒ no BA protocol
❌ f=1 ⇒ no BA protocol

Tendermint: [Buchman-Kwon-Milosevic 2018] In partial synchrony, there exists an SMR protocol that, when f < n/3, satisfies consistency (always) + liveness (eventually).

Tendermint: High-Level Ideas

(Remark: with these consensus protocols the devil is usually in details. Intuition is frequently misleading in distributed systems, so a formal proof is always good to require.)

Idea #1: iterated single-shot consensus. (output of each = ordered list of txs, “block”)

  • each node ii maintains its own “height” hih_i (latest block it knows) [in asynchronous phase, might be different for different nodes]
  • every single message is annotated with “what is the next block that I am trying to figure out”.
  • if a node is working on block 9, it is ignoring all messages about other blocks (with a tiny exception tbd later)

Idea #2: for a fixed height, keep proposing + voting till agreement (can do it since GST will come at some point).

  • BB style, there will be a proposer and voters. Proposer will be rotating.

Idea #3: two stages of voting. (This is the key innovation)

  • Why one stage is not enough? Because different nodes may see different voting outcomes, because (1) Byzantine nodes; (2) Asynchrony.
  • Node will have - restart outcome (when the voting failed from their viewpoint) - commit outcome (when they are convinced) - intermidiate “hedging” outcome (they are convinced but not 100%!)

Quorum Certificates (QCs)

Preliminaries:

  • Assume PKI, all messages signed by sender + pub keys distributed before the protocol (not needed in general, but in Tendermint needed)
  • Round = interval of 4∆ timesteps (shared clock, ∆ known ⇒ all nodes know which round is now)
  • Use rotating leaders (one per round) [easy due to shared clock + permissioned setting]

Definition: a quorum certificate (QC) is a batch of ≥23n\frac 2 3 n signed votes for some block B (at some height, at some round, at some stage (1 or 2) of that round).

Lemma: any two QCs overlap in at least one honest node. [simply because they overlap in ≥n/3 nodes, and we have a bound f<n/3]

Proof: overlap is 23n+23nn=13n>f\geq \frac 2 3 n + \frac 2 3 n - n = \frac 1 3 n >f, qed.

Corollary: any two (⇒all) QCs for some [block # + round + stage] must agree on the block.

This is because each honest node votes only once in one referendum [block # + round + stage]; see later for pseudo-code.

Plan: - each node ii maintains a pair of local variables (Bi,QCi)(B_i,QC_i), where the second is a QC for the first - initially QCiQC_i is null, and BiB_i = all unexecuted txs that ii knows about. - periodically updates to most recent (according to rounds/stages) block-QC pair it’s heard about - also save for future use any QCs for future blocks

The Tendermint Protocol (in pseudo-code)

Time is divided into rounds (4∆ timestamps): _._._._|_._._._|_._._._|_._._._|._._._|_._._._|_._._._|_._._._|_._._._|_._._._

Fix a height (e.g. block #9), and a round r with leader \ell. The round r is divided into 4 phases, and starts exactly at time 4∆r (since one round = 4∆).

Summary of the procedure
Summary of the procedure

t=4∆r: (phase 1)

  • \ell updates (B,QC)(B_\ell,QC_\ell) to the most recent QC known, proposes to all other nodes (including itself)
  • Message looks like this: (round r, height (block #9), (B,QC)(B_\ell,QC_\ell), signsign_\ell)

t=4∆r+∆: (phase 2)

  • If node ii receives (B,QC)(B_\ell,QC_\ell) from \ell (it might not if msg is delayed)

and

  • If QCQC_\ell is not older (in terms of rounds/stages) than QCiQC_i

then

  • Broadcast first-stage vote for BB_\ell: (including itself) (round r, height (block #9), vote fore BB_\ell “yes”, signisign_i)
  • Update (Bi,QCi):=(B,QC)(B_i,QC_i):=(B_\ell,QC_\ell)
  • Broadcast (Bi,QCi)=(B,QC)(B_i,QC_i)=(B_\ell,QC_\ell)

t=4∆r+2∆: (phase 3)

  • If node ii receives supermajority, that is 23n\geq \frac 2 3 n round-r stage-1 votes for BB (counting itself and the leader node \ell)

then

  • Update QCi:=QC_i:=newly votes received, Bi:=BB_i:=B (cause it knows that this is most recent QC)
  • Broadcast second-stage vote for BiB_i
  • Broadcast (Bi,QCi)(B_i,QC_i)

t=4∆r+3∆: (phase 4)

  • If node ii receives 23n\geq \frac 2 3 n round-r stage-2 votes for BB

then

  • Update QCi:QC_i:= this QC, Bi:=BB_i:=B
  • Commit BB to local history (because the block BB survived two stages of voting!)
  • Broadcast (Bi,QCi)(B_i,QC_i)
  • Increment hih_i
  • re-initialize: QCiQC_i is null, and BiB_i = all unexecuted txs that ii knows about.

t=4∆r+4∆: (just before the start of next round r+1)

  • If received in the background a stage-2 QC for block # hih_i supporting a block B

then

  • commit B to a local history, increment hih_i

(repeat this procedure if possible)

In the backbground (at all times):

  • Store all QCs received for future blocks hi+1h_i+1, hi+2h_i+2, …

From the notes:

image

Summary:

image

Tendermint: Proof of Consistency

Theorem: Tendermint satisfies SMR consistency (for a given block #, all honest nodes commit the same block).

Proof: Fix a height h (e.g., block #9).

We need to prove that there cannot be two QCs for block #9, stage-2, some round. If the rounds of these two QCs are the same then we are done by the overlap lemma!

Let r = first round in which [>n/3 honest nodes = set S] cast stage-2 votes for same block BB^*. This is of course a prerequisite for a creation of a stage-2 QC (cause >2n/3 voted, <n/3 Byzantine ⇒ >n/3 honest voted).

Intuition: We want to argue that stage-2 QCs for block #9 can only be for block BB^*. These [>n/3 honest nodes = set S] simply “lock-in” onto their vote for BB^* — they are never going to change it in stage-1 because of the line 6 in the pseudo-code pic above, and therefore there will be never >2n/3 votes in stage 2 for a block ≠ BB^*.

Formally: (by induction)

  • At the end of round r: (i) Bi=BB_i=B^* for all iSi\in S [by pseudo-code: on round-r 3rd phase >n/3 casted votes for BB^*⇒ on round-r 4th phase there cannot be QCs for other blocks. There cannot be also QCs in the background in the past, because round r is the “first” such round by definition] (ii) QCiQC_i from round-r stage-1 or later [obvious from the pseudo-code] (iii) all QCs for other blocks are from round r-1 or earlier [semi-obvious from the pseudo-code]
  • In round r+1 no nodes from S change their mind: (i) + (ii) + (iii) all hold! [(iii) in round-r ⇒ leader cannot propose an earlier QC for a different block ⇒ >n/3 nodes in round-r+1 stage-1 don’t update → don’t vote ⇒ no QC for a different block can be formed in round-r+1 stage-2, qed]
  • In the future rounds: same.

[…see notes for more details…]

qed.

Tendermint: Proof of Liveness

Claim: Tendermint satisfies SMR liveness (eventually).

(Our SMR liveness property is going to be weaker than the one from Lecture 4: Old, strong livenes: every tx submitted to one honest node gets included. New, weaker liveness: every tx submitted to all honest nodes gets included. This is not a big deal, since honest nodes can communicate via some gossib protocol and share valid txs between each other.)

Proof: Consider a tx T known to all honest nodes.

Fast forwards to a pair r1r_1 and r2r_2 of consecutive rounds after GST+∆ with honest leaders 1,2\ell_1,\ell_2 (this exsists, since f<n/3)

Lemma: at start of round r1r_1, every honest node is working on block # h or h+1. [Roughly, this is because after commiting blocks, honest blocks broadcast stage-2 QC for that block, and since we are post-GST, those broadcasts do arrive to other honest nodes!]

Proof: […see notes…]

Definition: a round is clean if (i) post-GST; (ii) honest leader; (iii) all honest nodes working on same block #; (iv) after update in 1st phase, leader’s QC at least as recent as that of an honest node

Lemma: clean round ⇒ all honest nodes commit the block proposed by the leader. [proof is by inspecting pseudo-code + remembering we are in post-GST…]

Case 1: all honest nodes start round r working on block # h+1. ⇒ round r1r_1 is clean, commits block including T

Case 2: all honest nodes start round r working on block # h. ⇒ round r1r_1 is clean, commits some block ⇒ round r2r_2 is clean, commits block including T

Case 3: the leader is behind. […]

Case 4: the leader is ahead. […]

Can we do better?

  • Can’t increase # of Byzantine nodes (without compromising elsewhere)
  • Can’t relax partial synchrony to asynchrony
  • Can’t have both liveness and safety before GST

Alternative trade-offs:

  • Longest chain consensus favors liveness over safety! (see next Lecture)

Same guarantees (fault-tolerance, consistency, eventual liveness) but better performance (smaller communication complexity, fewer rounds, faster recovery time post-GST, etc):

  • see HotStuff (Facebook Diem)
  • Casper FFG (Ethereum uses now)