Introduction
In Lisbon we came to a (sketchy) agreement on extending Hare protocol so to introduce proper equivocations defense. Hereby I am trying to provide a more precise description of the proposed changes. Because there are some open questions left, this topic is created to continue the discussion, which should hopefully conclude with the final design.
Caution: The description of the problem and its solution is in the context of Hare protocol but we hope to reuse the solution (if possible) for Tortoise and Random Beacon protocols.
Problem
There are 4 types of faulty behaviour in Hare:
- Missing message - a message was expected never arrived.
- Invalid message - when the signature of a message is not valid, or the message binary structure is incorrect or the eligibility proof is incorrect.
- Equivocation - when the the same eligibility is used to produce two (or more) different messages.
In case (1) there is not much we can do about it.
Case (2) has really two subcases - either the sender’s signature is OK (then we know who committed this fault) or the signature is not OK (then the message is just a simple spam).
Case (3) is the most dangerous one and this is really the reason why we need to prepare defense.
Why equivocations are so painful
Spreading equivocations is the primary way of attacking Hare protocol (because any other forms of attack are very easy to filter out). Moreover, by reusing one eligibility for several messages, an attacker can spam the network by gossiping millions of messages - every such message would be valid and legal by itself. These legal messages can make their way to become dependencies to messages from honest nodes before even recognizing that an attack it taking place.
Conceptual path to the solution
Ideally we would like to blacklist a node A as soon as we find A is equivocating. This could be formalized as the following “naive” strategy:
- Every node P tracks all incoming messages, so to discover equivocations.
- As soon as P discovers that A equivocated, P adds A to its local registry of “known equivocators”.
- P starts ignoring all messages coming from A. “Ignoring” means:
- P will no longer process messages from A at the Hare logic level
- P will no longer forward messages from A at the gossip level
Unfortunately, the above strategy has 3 problems:
- As soon as P blacklists A, it stops forwarding messages from A, so it prevents (or make it more difficult) for other honest nodes to witness malfeasant behaviour of A. Therefore, the awareness of “who are the bad guys” is not going to converge among hones nodes.
- As soon as P blacklists A, P can still receive messages from other honest node Q which is not yet aware that A is malicious. Possibly then Q included a message from A as part of a certificate, which then looks valid to Q, but will not be confirmed as valid by P.
- And what we should really do with the “old” messages from a node A ? For example it may be the case that we have a certificate from node Q including f+1 status messages, but after removing ones from a blacklisted node A, the number of remaining status messages may go below f+1, rendering the whole certificate invalid. How we should handle this ?
The actual solution
To handle above difficulties, we propose a more sophisticated solution, composed of the following elements:
- Every node maintains a collection:
knownEquivocators: Map[NodeId, EquivocatorInfo]
where EquivocatorInfo
is the following structure:
struct EquivocatorInfo {
proof: MalfeasanceProof
eligibilities: Map[RoundId, EligibilityProof]
}
-
As soon as a new equivocation is observed, a “malfeasance proof” for this equivocation is formed. Such a proof must contain two valid (and different) messages from the same sender, both targeting the same Hare round, so reusing the same eligibility. The proof must be signed by the observer. When node P observes a new equivocation committed by node A:
- P adds A to its the local registry of equivocators (with malfeasance proof)
- P broadcasts the malfeasance proof for A
- The “standard handling for known equivocators” is executed.
-
(“standard handling for known equivocators”) Whenever a valid message M from a known equivocator A arrives to node P:
- the eligibility proof included in M is stored in the corresponding entry of
knownEquivocators
- P broadcasts the eligibility proof of M and (separately) the malfeasance proof for A; the broadcasting is however “smart” (see below)
- the eligibility proof included in M is stored in the corresponding entry of
-
Broadcasting of malfeasance proofs and eligibility proofs is “smart” in the following sense:
- any two malfeasance proofs for the same node are considered equivalent
- any two eligibility proofs for the same
(nodeId, roundId)
are considered equivalent - any node P, while running the gossip protocol, does not forward a message M if it previously did a broadcast of a message equivalent to M (please notice that
knownEquivocators
contain enough information to implement this behaviour)
-
We change the “f+1” rule so to avoid reverting the past. This is however a more involved topic and we devote several next sections to explain it in detail.
The origin of f+1 rule (ADDNR18)
Throughout the Hare protocol there are several places where we are using the “f+1” rule. This rule is originally inherited from the ADDNR18 protocol: in the original assumptions of ADDNR18 we had (at most) f faulty nodes and total number of nodes n=2f+1. This was another way of saying that we assume strict majority of honest nodes and that we have at most f faulty nodes. So having these assumptions it follows that in every collection of f+1 nodes there must be at least one honest node.
How f+1 rule was adopted in Hare
There are 3 changes from ADDNR18:
- The collection of nodes participating in every round is selected randomly (via VRF), with the randomness calibrated so that 800 nodes on average are picked (all of the nodes are to be picked if the total size of the network is less than 800)
- We assume the total volume of faulty nodes does not exceed 1/3 (so this is far more optimistic assumption than the 50% assumed by ADDNR18)
- ADDNR18 assumed all nodes have weight 1, while in Hare we allow arbitrary weights; f+1 rule sums up weights
In Hare, the f value is just 800/2=400, and “f+1” is changed to “f+epsilon”, where epsilon is configurable.
Given all this together, a practical conclusion very similar to that of ADDNR18 still holds: in every Hare round, if a subset of the committee happens to have total weight exceeding f+epsilon, it is almost sure (think “overwhelming probability”) that at least one member of the committee is honest.
Caution: For simplicity of further reasoning we however stick to the original mathematical formulation of Hare (i.e. without weights)
.
All places where we currently use f+1 rule
- combining pre-round messages so to find values that have sufficient support (validation of status messages is based on this)
- building safe value proofs
- building commit certificates
- counting notify messages (protocol termination condition)
How we need to adjust the f+1 rule for the equivocations defense
Where things go wrong is the following scenario:
- node A behaves honestly for some initial part of Hare run
- messages created by node A become dependencies in several f+1 rule cases
- then A turns bad and starts spreading equivocations
- this gets noticed, so honest nodes blacklist A and start “reverting the past”
This “reverting of the past” is what we would like to avoid. An idea towards achieving this goal would is the following one:
While calculating the total weight of a collection-of-messages Q included in a certificate belonging to round r
: (for a collection of nodes A
we write |A|
to denote the sum of Hare-level eligibilities in A
)
- let
KE(r)
be the set of known equivocators for roundr
(i.e. equivocators for which we managed to collect eligibility proofs for the relevant round) - let
k = |KE(r)|
- let
Q' = {msg ∊ Q: msg.sender ∉ KE(r)}
- let
x = |Q'|
- we say that set
Q
passed the f+1-rule ifx >= f+1-k
The rationale behind this new form of f+1 rule is: whoever was using the f+1 rule might not be able to collect missing k messages because k nodes are known equivocators and so their messages got blacklisted by the p2p network. Moreover we want all the honest nodes to converge to the same view of the world, despite random order of arrival of messages send by equivocators, and also random timing of discovering equivocation-type violations of the protocol.
How the equivocation defense plays out in the case of pre-round spam attack
Our current handling of pre-round messages is: when a node A equivocates in pre-round, all the distinct variants of its pre-round message are combined at destination by taking the sum of candidate sets (and this also applies to delayed pre-round messages). This strategy has the advantage of converging to a coherent view across all honest nodes of what the initial set from node A should really be, and it is also resistant to balancing attacks over pre-round messages.
But in the context of the equivocation defense proposed above, the way we handle pre-round change: when node A sends second pre-round message, it will eventually be recognized as an equivocator. It is interesting to observe that, under the changed “f+1” rule, all honest nodes converge in the way they combine initial sets. This may be illustrated by an example.
Let’s say we have 5 nodes, and f=2 (so f+1=3). A will be the equivocator, everyone else is honest.
These are initial sets gossiped in the pre-round:
A: m1=[1, 2, 3, 6, 7] and m2=[1, 2, 4, 6, 7]
B: [2, 3, 4]
C: [1, 2, 7]
D: [ 3, 4, 5]
E: [5, 7 ,8]
m1
and m2
are two messages which form the equivocation in question. Say that node B first observes m1 then m2, while node C first observes m2, then m1.
Initially, node B observes only message m1, hence up to this point A seems to be honest. The set of values with “f+1” support, as seen by B, is [2, 3, 7]. At the same time, node C received only message m2 from A, and comes up with the following collection of values with “f+1” support: [2, 4, 7].
Now, some times later, both B and C gain the knowledge that A equivocated - either by observing the second message or by receiving the malfeasance proof. They now must re-calculate the set of values with “f+1” support according to the new knowledge. The number of known equivocators k is equal to 1, and both B and C have the eligibility proof for node A to be participant of the pre-round. Now they apply the “f+1-k” rule, where f+1-k=2.
- B concludes with the set [2, 3, 4, 5, 7]
- C concludes with the set [2, 3, 4, 5, 7]
What is interesting to notice:
- in doing this calculation both B and C completely ignore messages from equivocating A
- out of a sudden value 5, which was never proposed by A, gets included in the resulting set
Open questions left
- It is not clear how we should handle the other type of fault - namely the “invalid message” fault. Possibly we could apply the same strategy as in the case of an equivocation. Alternatively, we can just ignore invalid messages.
- The collection of equivocators should be maintained as a persistent collection “forever” (i.e. during all the centuries of the existence of Spacemesh blockchain). It feels like it would be more consistent if the list is actually part of the consensus. It could be for example that we reuse Hare for the purpose (nothing stops us from running Hare simultaneously on “red” and “blue” balls, where “red” balls correspond to proposals, while “blue” balls are identities of equivocators). This way the collection of newly discovered equivocators could be part of the block contents. However, this does not play well with our blocks not having a direct link to the parent block. In conclusion, the ever growing registry of equivocators feels like a potential “problem”, and it is even more problematic that it is both critical for consensus and not part of consensus. We would like for example that eligibility of a know equivocator is invalid, despite its ATX being OK. For this to be possible, however, the equivocators registry would have to be part of consensus.