Lecture 9. The Tendermint Protocol.

‣

Resources:

‣

HW:

Consensus with Partial Synchrony

Recall: Partially Synchronous model.

|t=0|____Asynch phase____|t=GST, unknown|____Sync phase, ∆ msg delay ______→

Goals: (for SMR)

consistency (always)
liveness (eventually, after GST)

Lecture 8: if f ≥ n/3, no protocol satisfies these goals (even with PKI).

	Synchrony	Partial synchrony	Asynchrony
Permissioned	✅ PKI, any f<n ⇒ BB protocol ❌ no PKI, f≥n/3 ⇒ no BB protocol	✅ ? ❌ f≥n/3 ⇒ no BA protocol	❌ f=1 ⇒ no BA protocol

Tendermint: [Buchman-Kwon-Milosevic 2018] In partial synchrony, there exists an SMR protocol that, when f < n/3, satisfies consistency (always) + liveness (eventually).

Tendermint: High-Level Ideas

(Remark: with these consensus protocols the devil is usually in details. Intuition is frequently misleading in distributed systems, so a formal proof is always good to require.)

Idea #1: iterated single-shot consensus. (output of each = ordered list of txs, “block”)

each node $i$ maintains its own “height” $h_i$ (latest block it knows) [in asynchronous phase, might be different for different nodes]
every single message is annotated with “what is the next block that I am trying to figure out”.
if a node is working on block 9, it is ignoring all messages about other blocks (with a tiny exception tbd later)

Idea #2: for a fixed height, keep proposing + voting till agreement (can do it since GST will come at some point).

BB style, there will be a proposer and voters. Proposer will be rotating.

Idea #3: two stages of voting. (This is the key innovation)

Why one stage is not enough? Because different nodes may see different voting outcomes, because (1) Byzantine nodes; (2) Asynchrony.
Node will have - restart outcome (when the voting failed from their viewpoint) - commit outcome (when they are convinced) - intermidiate “hedging” outcome (they are convinced but not 100%!)

Quorum Certificates (QCs)

Preliminaries:

Assume PKI, all messages signed by sender + pub keys distributed before the protocol (not needed in general, but in Tendermint needed)
Round = interval of 4∆ timesteps (shared clock, ∆ known ⇒ all nodes know which round is now)
Use rotating leaders (one per round) [easy due to shared clock + permissioned setting]

Definition: a quorum certificate (QC) is a batch of ≥ $\frac 2 3 n$ signed votes for some block B (at some height, at some round, at some stage (1 or 2) of that round).

Lemma: any two QCs overlap in at least one honest node. [simply because they overlap in ≥n/3 nodes, and we have a bound f<n/3]

Proof: overlap is $\geq \frac 2 3 n + \frac 2 3 n - n = \frac 1 3 n >f$ , qed.

Corollary: any two (⇒all) QCs for some [block # + round + stage] must agree on the block.

This is because each honest node votes only once in one referendum [block # + round + stage]; see later for pseudo-code.

Plan: - each node $i$ maintains a pair of local variables $(B_i,QC_i)$ , where the second is a QC for the first - initially $QC_i$ is null, and $B_i$ = all unexecuted txs that $i$ knows about. - periodically updates to most recent (according to rounds/stages) block-QC pair it’s heard about - also save for future use any QCs for future blocks

The Tendermint Protocol (in pseudo-code)

Time is divided into rounds (4∆ timestamps): _._._._|_._._._|_._._._|_._._._|._._._|_._._._|_._._._|_._._._|_._._._|_._._._

Fix a height (e.g. block #9), and a round r with leader $\ell$ . The round r is divided into 4 phases, and starts exactly at time 4∆r (since one round = 4∆).

Summary of the procedure

t=4∆r: (phase 1)

$\ell$ updates $(B_\ell,QC_\ell)$ to the most recent QC known, proposes to all other nodes (including itself)
Message looks like this: (round r, height (block #9), $(B_\ell,QC_\ell)$ , $sign_\ell$ )

t=4∆r+∆: (phase 2)

If node $i$ receives $(B_\ell,QC_\ell)$ from $\ell$ (it might not if msg is delayed)

and

If $QC_\ell$ is not older (in terms of rounds/stages) than $QC_i$

then

Broadcast first-stage vote for $B_\ell$ : (including itself) (round r, height (block #9), vote fore $B_\ell$ “yes”, $sign_i$ )
Update $(B_i,QC_i):=(B_\ell,QC_\ell)$
Broadcast $(B_i,QC_i)=(B_\ell,QC_\ell)$

t=4∆r+2∆: (phase 3)

If node $i$ receives supermajority, that is $\geq \frac 2 3 n$ round-r stage-1 votes for $B$ (counting itself and the leader node $\ell$ )

then

Update $QC_i:=$ newly votes received, $B_i:=B$ (cause it knows that this is most recent QC)
Broadcast second-stage vote for $B_i$
Broadcast $(B_i,QC_i)$

t=4∆r+3∆: (phase 4)

If node $i$ receives $\geq \frac 2 3 n$ round-r stage-2 votes for $B$

then

Update $QC_i:$ = this QC, $B_i:=B$
Commit $B$ to local history (because the block $B$ survived two stages of voting!)
Broadcast $(B_i,QC_i)$
Increment $h_i$
re-initialize: $QC_i$ is null, and $B_i$ = all unexecuted txs that $i$ knows about.

t=4∆r+4∆: (just before the start of next round r+1)

If received in the background a stage-2 QC for block # $h_i$ supporting a block B

then

commit B to a local history, increment $h_i$

(repeat this procedure if possible)

In the backbground (at all times):

Store all QCs received for future blocks $h_i+1$ , $h_i+2$ , …

From the notes:

Summary:

Tendermint: Proof of Consistency

Theorem: Tendermint satisfies SMR consistency (for a given block #, all honest nodes commit the same block).

Proof: Fix a height h (e.g., block #9).

We need to prove that there cannot be two QCs for block #9, stage-2, some round. If the rounds of these two QCs are the same then we are done by the overlap lemma!

Let r = first round in which [>n/3 honest nodes = set S] cast stage-2 votes for same block $B^*$ . This is of course a prerequisite for a creation of a stage-2 QC (cause >2n/3 voted, <n/3 Byzantine ⇒ >n/3 honest voted).

Intuition: We want to argue that stage-2 QCs for block #9 can only be for block $B^*$ . These [>n/3 honest nodes = set S] simply “lock-in” onto their vote for $B^*$ — they are never going to change it in stage-1 because of the line 6 in the pseudo-code pic above, and therefore there will be never >2n/3 votes in stage 2 for a block ≠ $B^*$ .

Formally: (by induction)

At the end of round r: (i) $B_i=B^*$ for all $i\in S$ [by pseudo-code: on round-r 3rd phase >n/3 casted votes for $B^*$ ⇒ on round-r 4th phase there cannot be QCs for other blocks. There cannot be also QCs in the background in the past, because round r is the “first” such round by definition] (ii) $QC_i$ from round-r stage-1 or later [obvious from the pseudo-code] (iii) all QCs for other blocks are from round r-1 or earlier [semi-obvious from the pseudo-code]
In round r+1 no nodes from S change their mind: (i) + (ii) + (iii) all hold! [(iii) in round-r ⇒ leader cannot propose an earlier QC for a different block ⇒ >n/3 nodes in round-r+1 stage-1 don’t update → don’t vote ⇒ no QC for a different block can be formed in round-r+1 stage-2, qed]
In the future rounds: same.

[…see notes for more details…]

qed.

Tendermint: Proof of Liveness

Claim: Tendermint satisfies SMR liveness (eventually).

(Our SMR liveness property is going to be weaker than the one from Lecture 4: Old, strong livenes: every tx submitted to one honest node gets included. New, weaker liveness: every tx submitted to all honest nodes gets included. This is not a big deal, since honest nodes can communicate via some gossib protocol and share valid txs between each other.)

Proof: Consider a tx T known to all honest nodes.

Fast forwards to a pair $r_1$ and $r_2$ of consecutive rounds after GST+∆ with honest leaders $\ell_1,\ell_2$ (this exsists, since f<n/3)

Lemma: at start of round $r_1$ , every honest node is working on block # h or h+1. [Roughly, this is because after commiting blocks, honest blocks broadcast stage-2 QC for that block, and since we are post-GST, those broadcasts do arrive to other honest nodes!]

Proof: […see notes…]

Definition: a round is clean if (i) post-GST; (ii) honest leader; (iii) all honest nodes working on same block #; (iv) after update in 1st phase, leader’s QC at least as recent as that of an honest node

Lemma: clean round ⇒ all honest nodes commit the block proposed by the leader. [proof is by inspecting pseudo-code + remembering we are in post-GST…]

Case 1: all honest nodes start round r working on block # h+1. ⇒ round $r_1$ is clean, commits block including T

Case 2: all honest nodes start round r working on block # h. ⇒ round $r_1$ is clean, commits some block ⇒ round $r_2$ is clean, commits block including T

Case 3: the leader is behind. […]

Case 4: the leader is ahead. […]

Can we do better?

Can’t increase # of Byzantine nodes (without compromising elsewhere)
Can’t relax partial synchrony to asynchrony
Can’t have both liveness and safety before GST

Alternative trade-offs:

Longest chain consensus favors liveness over safety! (see next Lecture)

Same guarantees (fault-tolerance, consistency, eventual liveness) but better performance (smaller communication complexity, fewer rounds, faster recovery time post-GST, etc):

see HotStuff (Facebook Diem)
Casper FFG (Ethereum uses now)