More and more applications are now distributed and in non-trivial distributed applications, it appears that the computing entities (processes) have to agree in one way or another, for example, to take a common decision, execute specific actions, or validate some commitment. The most famous distributed agreement problem is the consensus problem. A process crash failure occurs when a process stops prematurely; it can be seen as a benign failure, as a crashed process did not pollute the computation before crashing. The situation is different with Byzantine failures. A process has a Byzantine behavior when it arbitrarily deviates from its intended behavior. Let us notice that, from a failure hierarchy point of view, process crashes (unexpected halting) constitute a strict subset of Byzantine failures. As message-passing distributed systems are more and more pervasive, the assumption "no process has a bad behavior" is no longer sensible. Hence, agreement in Byzantine message-passing systems is becoming a more and more important issue of fault-tolerance.
This lecture aims to show how the consensus problem can be solved in both synchronous and asynchronous distributed message-passing systems prone to Byzantine failures. We present some lower bounds on the ratio of Byzantine processes that can be tolerated and how these bounds evolve according to the expected complexity of the obtained solutions. In synchronous systems, we will present the basic consensus algorithm called EIG (exponential information gathering) and show how its exponential complexity can be lowered. In the asynchronous model, consensus cannot be solved deterministically, we will show how to circumvent this impossibility by using timing assumptions or randomization. For this and in order to have more intuitive algorithms, we will introduce a series of broadcast primitives that help reduce the noise due to the Byzantine behavior of part of the processes.
Slides and exercises