*Resources:*

*HW:*

## Consensus with Partial Synchrony

__Recall:__ Partially Synchronous model.

|t=0|**____Asynch phase____**|t=GST, *unknown*|**____Sync phase, ∆ msg delay ______**→

__Goals:__ (for SMR)

- consistency (always)
- liveness (eventually, after GST)

__Lecture 8:__ if f ≥ n/3, no protocol satisfies these goals (even with PKI).

Synchrony | Partial synchrony | Asynchrony | |

Permissioned | ✅ PKI, any f<n ⇒ BB protocol
❌ no PKI, f≥n/3 ⇒ no BB protocol | ✅ ?
❌ f≥n/3 ⇒ no BA protocol | ❌ f=1 ⇒ no BA protocol |

** Tendermint:** [Buchman-Kwon-Milosevic 2018]
In partial synchrony, there exists an SMR protocol that, when f < n/3, satisfies consistency (always) + liveness (eventually).

## Tendermint: High-Level Ideas

(Remark: with these consensus protocols the devil is usually in details. Intuition is frequently misleading in distributed systems, so a formal proof is always good to require.)

__Idea #1:__ iterated single-shot consensus. (output of each = ordered list of txs, “block”)

- each node $i$ maintains its own “height” $h_i$ (latest block it knows) [in asynchronous phase, might be different for different nodes]
- every single message is annotated with “what is the next block that I am trying to figure out”.
- if a node is working on block 9, it is ignoring all messages about other blocks (with a tiny exception tbd later)

__Idea #2:__ for a fixed height, keep proposing + voting till agreement (can do it since GST will come at some point).

- BB style, there will be a proposer and voters. Proposer will be rotating.

__Idea #3:__ two stages of voting. (This is the key innovation)

- Why one stage is not enough? Because different nodes may see different voting outcomes, because (1) Byzantine nodes; (2) Asynchrony.
- Node will have - restart outcome (when the voting failed from their viewpoint) - commit outcome (when they are convinced) - intermidiate “hedging” outcome (they are convinced but not 100%!)

## Quorum Certificates (QCs)

__Preliminaries:__

- Assume PKI, all messages signed by sender + pub keys distributed before the protocol (not needed in general, but in Tendermint needed)
__Round__= interval of 4∆ timesteps (shared clock, ∆ known ⇒ all nodes know which round is now)- Use rotating leaders (one per round) [easy due to shared clock + permissioned setting]

__Definition:__ a **quorum certificate (QC)** is a batch of ≥$\frac 2 3 n$ signed votes for some block B (at some height, at some round, at some stage (1 or 2) of that round).

__Lemma:__ any two QCs overlap in at least one honest node.
[simply because they overlap in ≥n/3 nodes, and we have a bound f<n/3]

Proof: overlap is $\geq \frac 2 3 n + \frac 2 3 n - n = \frac 1 3 n >f$, qed.

__Corollary:__ any two (⇒all) QCs for some [block # + round + stage] must agree on the block.

This is because each honest node votes only once in one referendum [block # + round + stage]; see later for pseudo-code.

__Plan:__
- each node $i$ maintains a pair of local variables $(B_i,QC_i)$, where the second is a QC for the first
- initially $QC_i$ is null, and $B_i$ = all unexecuted txs that $i$ knows about.
- periodically updates to most recent (according to rounds/stages) block-QC pair it’s heard about
- also save for future use any QCs for future blocks

## The Tendermint Protocol (in pseudo-code)

Time is divided into rounds (4∆ timestamps):
_*._*._._|_._._._|_._._._|_._._._|._._._|_._._._|_._._._|_._._._|_._._._|_._._._

Fix a height (e.g. block #9), and a round r with leader $\ell$. The round r is divided into 4 phases, and starts exactly at time 4∆r (since one round = 4∆).

__t=4∆r:__ (phase 1)

- $\ell$ updates $(B_\ell,QC_\ell)$ to the most recent QC known, proposes to all other nodes (including itself)
- Message looks like this: (round r, height (block #9), $(B_\ell,QC_\ell)$, $sign_\ell$)

__t=4∆r+∆:__ (phase 2)

- If node $i$ receives $(B_\ell,QC_\ell)$ from $\ell$ (it might not if msg is delayed)

and

- If $QC_\ell$ is not older (in terms of rounds/stages) than $QC_i$

then

- Broadcast first-stage vote for $B_\ell$: (including itself) (round r, height (block #9), vote fore $B_\ell$ “yes”, $sign_i$)
- Update $(B_i,QC_i):=(B_\ell,QC_\ell)$
- Broadcast $(B_i,QC_i)=(B_\ell,QC_\ell)$

__t=4∆r+2∆:__ (phase 3)

- If node $i$ receives supermajority, that is $\geq \frac 2 3 n$ round-r stage-1 votes for $B$ (counting itself and the leader node $\ell$)

then

- Update $QC_i:=$newly votes received, $B_i:=B$ (cause it knows that this is most recent QC)
- Broadcast second-stage vote for $B_i$
- Broadcast $(B_i,QC_i)$

__t=4∆r+3∆:__ (phase 4)

- If node $i$ receives $\geq \frac 2 3 n$ round-r stage-2 votes for $B$

then

- Update $QC_i:$= this QC, $B_i:=B$
- Commit $B$ to local history (because the block $B$ survived two stages of voting!)
- Broadcast $(B_i,QC_i)$
- Increment $h_i$
- re-initialize: $QC_i$ is null, and $B_i$ = all unexecuted txs that $i$ knows about.

__t=4∆r+4∆:__ (just before the start of next round r+1)

- If received in the background a stage-2 QC for block # $h_i$ supporting a block B

then

- commit B to a local history, increment $h_i$

(repeat this procedure if possible)

__In the backbground (at all times):__

- Store all QCs received for future blocks $h_i+1$, $h_i+2$, …

From the notes:

Summary:

## Tendermint: Proof of Consistency

__Theorem:__ Tendermint satisfies SMR consistency (for a given block #, all honest nodes commit the same block).

Proof: Fix a height h (e.g., block #9).

We need to prove that there cannot be two QCs for block #9, stage-2, some round. If the rounds of these two QCs are the same then we are done by the overlap lemma!

Let r = first round in which [>n/3 honest nodes = set S] cast stage-2 votes for same block $B^*$. This is of course a prerequisite for a creation of a stage-2 QC (cause >2n/3 voted, <n/3 Byzantine ⇒ >n/3 honest voted).

__Intuition:__ We want to argue that stage-2 QCs for block #9 can only be for block $B^*$. These [>n/3 honest nodes = set S] simply “lock-in” onto their vote for $B^*$ — they are never going to change it in stage-1 because of the line 6 in the pseudo-code pic above, and therefore there will be never >2n/3 votes in stage 2 for a block ≠ $B^*$.

__Formally:__ (by induction)

- At the end of round r: (i) $B_i=B^*$ for all $i\in S$ [by pseudo-code: on round-r 3rd phase >n/3 casted votes for $B^*$⇒ on round-r 4th phase there cannot be QCs for other blocks. There cannot be also QCs in the background in the past, because round r is the “first” such round by definition] (ii) $QC_i$ from round-r stage-1 or later [obvious from the pseudo-code] (iii) all QCs for other blocks are from round r-1 or earlier [semi-obvious from the pseudo-code]
- In round r+1 no nodes from S change their mind: (i) + (ii) + (iii) all hold! [(iii) in round-r ⇒ leader cannot propose an earlier QC for a different block ⇒ >n/3 nodes in round-r+1 stage-1 don’t update → don’t vote ⇒ no QC for a different block can be formed in round-r+1 stage-2, qed]
- In the future rounds: same.

[…see notes for more details…]

qed.

## Tendermint: Proof of Liveness

__Claim:__ Tendermint satisfies SMR liveness (eventually).

(Our SMR liveness property is going to be weaker than the one from Lecture 4:
__Old, strong livenes:__ every tx submitted to one honest node gets included.
__New, weaker liveness__: every tx submitted to all honest nodes gets included.
This is not a big deal, since honest nodes can communicate via some gossib protocol and share valid txs between each other.)

Proof: Consider a tx T known to __all__ honest nodes.

Fast forwards to a pair $r_1$ and $r_2$ of consecutive rounds after GST+∆ with honest leaders $\ell_1,\ell_2$ (this exsists, since f<n/3)

__Lemma:__ at start of round $r_1$, every honest node is working on block # h or h+1.
[Roughly, this is because after commiting blocks, honest blocks broadcast stage-2 QC for that block, and since we are post-GST, those broadcasts do arrive to other honest nodes!]

Proof: […see notes…]

__Definition:__ a round is **clean** if (i) post-GST; (ii) honest leader; (iii) all honest nodes working on same block #; (iv) after update in 1st phase, leader’s QC at least as recent as that of an honest node

__Lemma:__ clean round ⇒ all honest nodes commit the block proposed by the leader.
[proof is by inspecting pseudo-code + remembering we are in post-GST…]

__Case 1:__ all honest nodes start round r working on block # h+1.
⇒ round $r_1$ is clean, commits block including T

__Case 2:__ all honest nodes start round r working on block # h.
⇒ round $r_1$ is clean, commits some block ⇒ round $r_2$ is clean, commits block including T

__Case 3:__ the leader is behind.
[…]

__Case 4:__ the leader is ahead.
[…]

## Can we do better?

- Can’t increase # of Byzantine nodes (without compromising elsewhere)
- Can’t relax partial synchrony to asynchrony
- Can’t have both liveness and safety before GST

Alternative trade-offs:

- Longest chain consensus favors liveness over safety! (see next Lecture)

Same guarantees (fault-tolerance, consistency, eventual liveness) but better performance (smaller communication complexity, fewer rounds, faster recovery time post-GST, etc):

- see HotStuff (Facebook Diem)
- Casper FFG (Ethereum uses now)