Hare equivocations defense

Introduction
In Lisbon we came to a (sketchy) agreement on extending Hare protocol so to introduce proper equivocations defense. Hereby I am trying to provide a more precise description of the proposed changes. Because there are some open questions left, this topic is created to continue the discussion, which should hopefully conclude with the final design.

Caution: The description of the problem and its solution is in the context of Hare protocol but we hope to reuse the solution (if possible) for Tortoise and Random Beacon protocols.

Problem
There are 4 types of faulty behaviour in Hare:

  1. Missing message - a message was expected never arrived.
  2. Invalid message - when the signature of a message is not valid, or the message binary structure is incorrect or the eligibility proof is incorrect.
  3. Equivocation - when the the same eligibility is used to produce two (or more) different messages.

In case (1) there is not much we can do about it.
Case (2) has really two subcases - either the sender’s signature is OK (then we know who committed this fault) or the signature is not OK (then the message is just a simple spam).
Case (3) is the most dangerous one and this is really the reason why we need to prepare defense.

Why equivocations are so painful
Spreading equivocations is the primary way of attacking Hare protocol (because any other forms of attack are very easy to filter out). Moreover, by reusing one eligibility for several messages, an attacker can spam the network by gossiping millions of messages - every such message would be valid and legal by itself. These legal messages can make their way to become dependencies to messages from honest nodes before even recognizing that an attack it taking place.

Conceptual path to the solution
Ideally we would like to blacklist a node A as soon as we find A is equivocating. This could be formalized as the following “naive” strategy:

  1. Every node P tracks all incoming messages, so to discover equivocations.
  2. As soon as P discovers that A equivocated, P adds A to its local registry of “known equivocators”.
  3. P starts ignoring all messages coming from A. “Ignoring” means:
  • P will no longer process messages from A at the Hare logic level
  • P will no longer forward messages from A at the gossip level

Unfortunately, the above strategy has 3 problems:

  1. As soon as P blacklists A, it stops forwarding messages from A, so it prevents (or make it more difficult) for other honest nodes to witness malfeasant behaviour of A. Therefore, the awareness of “who are the bad guys” is not going to converge among hones nodes.
  2. As soon as P blacklists A, P can still receive messages from other honest node Q which is not yet aware that A is malicious. Possibly then Q included a message from A as part of a certificate, which then looks valid to Q, but will not be confirmed as valid by P.
  3. And what we should really do with the “old” messages from a node A ? For example it may be the case that we have a certificate from node Q including f+1 status messages, but after removing ones from a blacklisted node A, the number of remaining status messages may go below f+1, rendering the whole certificate invalid. How we should handle this ?

The actual solution
To handle above difficulties, we propose a more sophisticated solution, composed of the following elements:

  1. Every node maintains a collection:
    knownEquivocators: Map[NodeId, EquivocatorInfo]

where EquivocatorInfo is the following structure:

    struct EquivocatorInfo {
        proof: MalfeasanceProof
        eligibilities: Map[RoundId, EligibilityProof] 
    }
  1. As soon as a new equivocation is observed, a “malfeasance proof” for this equivocation is formed. Such a proof must contain two valid (and different) messages from the same sender, both targeting the same Hare round, so reusing the same eligibility. The proof must be signed by the observer. When node P observes a new equivocation committed by node A:

    • P adds A to its the local registry of equivocators (with malfeasance proof)
    • P broadcasts the malfeasance proof for A
    • The “standard handling for known equivocators” is executed.
  2. (“standard handling for known equivocators”) Whenever a valid message M from a known equivocator A arrives to node P:

    • the eligibility proof included in M is stored in the corresponding entry of knownEquivocators
    • P broadcasts the eligibility proof of M and (separately) the malfeasance proof for A; the broadcasting is however “smart” (see below)
  3. Broadcasting of malfeasance proofs and eligibility proofs is “smart” in the following sense:

    • any two malfeasance proofs for the same node are considered equivalent
    • any two eligibility proofs for the same (nodeId, roundId) are considered equivalent
    • any node P, while running the gossip protocol, does not forward a message M if it previously did a broadcast of a message equivalent to M (please notice that knownEquivocators contain enough information to implement this behaviour)
  4. We change the “f+1” rule so to avoid reverting the past. This is however a more involved topic and we devote several next sections to explain it in detail.

The origin of f+1 rule (ADDNR18)
Throughout the Hare protocol there are several places where we are using the “f+1” rule. This rule is originally inherited from the ADDNR18 protocol: in the original assumptions of ADDNR18 we had (at most) f faulty nodes and total number of nodes n=2f+1. This was another way of saying that we assume strict majority of honest nodes and that we have at most f faulty nodes. So having these assumptions it follows that in every collection of f+1 nodes there must be at least one honest node.

How f+1 rule was adopted in Hare
There are 3 changes from ADDNR18:

  • The collection of nodes participating in every round is selected randomly (via VRF), with the randomness calibrated so that 800 nodes on average are picked (all of the nodes are to be picked if the total size of the network is less than 800)
  • We assume the total volume of faulty nodes does not exceed 1/3 (so this is far more optimistic assumption than the 50% assumed by ADDNR18)
  • ADDNR18 assumed all nodes have weight 1, while in Hare we allow arbitrary weights; f+1 rule sums up weights

In Hare, the f value is just 800/2=400, and “f+1” is changed to “f+epsilon”, where epsilon is configurable.

Given all this together, a practical conclusion very similar to that of ADDNR18 still holds: in every Hare round, if a subset of the committee happens to have total weight exceeding f+epsilon, it is almost sure (think “overwhelming probability”) that at least one member of the committee is honest.

Caution: For simplicity of further reasoning we however stick to the original mathematical formulation of Hare (i.e. without weights)
.
All places where we currently use f+1 rule

  1. combining pre-round messages so to find values that have sufficient support (validation of status messages is based on this)
  2. building safe value proofs
  3. building commit certificates
  4. counting notify messages (protocol termination condition)

How we need to adjust the f+1 rule for the equivocations defense
Where things go wrong is the following scenario:

  • node A behaves honestly for some initial part of Hare run
  • messages created by node A become dependencies in several f+1 rule cases
  • then A turns bad and starts spreading equivocations
  • this gets noticed, so honest nodes blacklist A and start “reverting the past”

This “reverting of the past” is what we would like to avoid. An idea towards achieving this goal would is the following one:

While calculating the total weight of a collection-of-messages Q included in a certificate belonging to round r: (for a collection of nodes A we write |A| to denote the sum of Hare-level eligibilities in A)

  • let KE(r) be the set of known equivocators for round r (i.e. equivocators for which we managed to collect eligibility proofs for the relevant round)
  • let k = |KE(r)|
  • let Q' = {msg ∊ Q: msg.sender ∉ KE(r)}
  • let x = |Q'|
  • we say that set Q passed the f+1-rule if x >= f+1-k

The rationale behind this new form of f+1 rule is: whoever was using the f+1 rule might not be able to collect missing k messages because k nodes are known equivocators and so their messages got blacklisted by the p2p network. Moreover we want all the honest nodes to converge to the same view of the world, despite random order of arrival of messages send by equivocators, and also random timing of discovering equivocation-type violations of the protocol.

How the equivocation defense plays out in the case of pre-round spam attack
Our current handling of pre-round messages is: when a node A equivocates in pre-round, all the distinct variants of its pre-round message are combined at destination by taking the sum of candidate sets (and this also applies to delayed pre-round messages). This strategy has the advantage of converging to a coherent view across all honest nodes of what the initial set from node A should really be, and it is also resistant to balancing attacks over pre-round messages.

But in the context of the equivocation defense proposed above, the way we handle pre-round change: when node A sends second pre-round message, it will eventually be recognized as an equivocator. It is interesting to observe that, under the changed “f+1” rule, all honest nodes converge in the way they combine initial sets. This may be illustrated by an example.

Let’s say we have 5 nodes, and f=2 (so f+1=3). A will be the equivocator, everyone else is honest.

These are initial sets gossiped in the pre-round:

A: m1=[1, 2, 3, 6, 7] and m2=[1, 2, 4, 6, 7]
B: [2, 3, 4]
C: [1, 2, 7]
D: [ 3, 4, 5]
E: [5, 7 ,8]

m1 and m2 are two messages which form the equivocation in question. Say that node B first observes m1 then m2, while node C first observes m2, then m1.

Initially, node B observes only message m1, hence up to this point A seems to be honest. The set of values with “f+1” support, as seen by B, is [2, 3, 7]. At the same time, node C received only message m2 from A, and comes up with the following collection of values with “f+1” support: [2, 4, 7].

Now, some times later, both B and C gain the knowledge that A equivocated - either by observing the second message or by receiving the malfeasance proof. They now must re-calculate the set of values with “f+1” support according to the new knowledge. The number of known equivocators k is equal to 1, and both B and C have the eligibility proof for node A to be participant of the pre-round. Now they apply the “f+1-k” rule, where f+1-k=2.

  • B concludes with the set [2, 3, 4, 5, 7]
  • C concludes with the set [2, 3, 4, 5, 7]

What is interesting to notice:

  • in doing this calculation both B and C completely ignore messages from equivocating A
  • out of a sudden value 5, which was never proposed by A, gets included in the resulting set

Open questions left

  • It is not clear how we should handle the other type of fault - namely the “invalid message” fault. Possibly we could apply the same strategy as in the case of an equivocation. Alternatively, we can just ignore invalid messages.
  • The collection of equivocators should be maintained as a persistent collection “forever” (i.e. during all the centuries of the existence of Spacemesh blockchain). It feels like it would be more consistent if the list is actually part of the consensus. It could be for example that we reuse Hare for the purpose (nothing stops us from running Hare simultaneously on “red” and “blue” balls, where “red” balls correspond to proposals, while “blue” balls are identities of equivocators). This way the collection of newly discovered equivocators could be part of the block contents. However, this does not play well with our blocks not having a direct link to the parent block. In conclusion, the ever growing registry of equivocators feels like a potential “problem”, and it is even more problematic that it is both critical for consensus and not part of consensus. We would like for example that eligibility of a know equivocator is invalid, despite its ATX being OK. For this to be possible, however, the equivocators registry would have to be part of consensus.

Case (2) has really two subcases - either the sender’s signature is OK (then we know who committed this fault)

note that another signed type can be decoded as a hare message. signature will be ok, but message won’t pass syntactic check. so it would be wrong to cancel identity in such case. maybe we should be using domain separators for each individual signed type to avoid this

The proof must be signed by the observer.

why proof needs to be signed by the observer? two equivocating messages is already a proof by itself, so what is a purpose of the signature? what if equivalent proofs are signed by different observers, why do we need to collect only one then, or we want to collect all for no reason?

This “reverting of the past” is what we would like to avoid. An idea towards achieving this goal would is the following one:

beside that part we also need to cancel identities and exclude them from eligibility computation for iterations in current or future layers (basically ensuring that k parameter doesn’t include participants that equivocated previously)

why proof needs to be signed by the observer?

For the same reason we require every other Hare message to be signed. If signatures are not required that hackers can abuse given message type for SPAM attacks.

what if equivalent proofs are signed by different observers

As I explained above, if any two proofs that node A has committed a crime are considered equivalent. Therefore, an honest node P will broadcast a proof of “A is wrong” at most once. Also, every honest node will keep only one proof of malfeasance per node (in the equivocators registry).

beside that part we also need to cancel identities and exclude them from eligibility computation for iterations in current or future layers

Yes, we would like to do so. That said, I perceive this part to be a bit tricky: we effectively need to maintain “sort of” consensus on who was wrong, but outside the consensus protocol.

The root cause of troubles seems to be our block not including a pointer to parent block. If parent pointer is there, then we could maintain equivocators list as part of Hare consensus (imagine blue and red marbles - blue marbles are “proposals”, red marbles are “equivocators”, both both types of marbles coexist in sets subject to Hare consensus, and then equivocators registry gets persisted in a block, and so in the layer T+1 we have a consensus of excluding identities marked as “bad” in all blocks up to layer T).

In our current design all we can do is to maintain equivocators registry forever - but such registry is per-node (so they are not in sync, in general). Then, honest nodes can just blacklist messages from whoever is on the list. But I wonder what is @talm view on this.

For the same reason we require every other Hare message to be signed. If signatures are not required that hackers can abuse given message type for SPAM attacks.

i dont think that this is true in this case.
proof is composed from two equivocating messages, if both of them are valid - then proof is valid too. if any of them is not valid - proof is also invalid. additional signature is pretty much irrelevant, and will add only more bloat and complexity to the protocol.

moreover we don’t even need a separate type for such proof, as messages can be delivered separately. and reconstructed on the receiver side into the “proof”.

Why?

Since writing this @dmitry has already asked this and I agree with his explanation. Let me phrase it differently:

A fraud proof that is invalid (i.e. is not two independently-valid Hare messages using the same eligibility) will be syntactically invalid. It will not be gossiped and the P2P transmitter of such a message will be dropped. This means the attack surface is minimal (only direct neighbors) and there are already endless other messages that one can send that have the same effect. Validating a fraud proof like this is also not too expensive.

It’s worth noting that we assume this about the general population so that we can derive from this assumption that any randomly selected committee of 800 will be no more than 50% “faulty” with high probability. So, at the Hare level it’s basically the same 50% assumption from ADDNR18.

It’s not exactly arbitrary. We use integer “weights” that in the average case sum up to 800. Here’s an analogy: each smesher in the system has a pool of 800 “electors.” Instead of picking 800 smeshers, we pick 800 electors from all the electors in the system (based on the smesher’s allocated space-time). This means each smesher may have more than once vote, but since we can’t guarantee that each smesher uniquely represents each real-world entity, this system is no different than the original in terms of the potential of controlling more than one vote. Put another way: all Hare votes are equal, and they’re randomly assigned based on relative space-time.

Reading the rest shows that you understand it, but leaving this comment for others as well.

I think the proper way to do this (with less code changes as well) is to add this info to the proposal.

However, the problem with this approach (committee agrees and proofs are discarded) is that a temporary majority would allow an attacker to falsely blacklist smeshers, whereas the “naïve” approach of just storing the proofs is safer in that regard.

moreover we don’t even need a separate type for such proof, as messages can be delivered separately. and reconstructed on the receiver side into the “proof”.

The separate type of message (EquivocationProof) is really needed. Let me explain the reasons again. For this let’s say that messages p and q from an equivocation and their sender is node B. Let H1 and H2 be some honest nodes, and EqProof(p,q) be the equivocation proof including p and q:

  • as soon as H1 added B to his equivocators registry, H will blacklist all messages from `B’
  • H2 may never witness messages p and q because the blacklisting “wave” can grow quick enough so to prevent some honest nodes from observing actual equivocations
  • EqProof(p,q) will bypass blacklisting because its sender is not B
  • we need a “special” behaviour of gossip in the case of messages of type EqProof: an equivocation proof for B will be forwarded at gossip level ONLY if the forwarder just realized that B is bad; this is what we mean by “all equivocation proofs are considered equivalent”

What I wrote above is only possible if we do establish a dedicated message type, containing a pair of messages.

proof is composed from two equivocating messages, if both of them are valid - then proof is valid too. if any of them is not valid - proof is also invalid. additional signature is pretty much irrelevant, and will add only more bloat and complexity to the protocol.

Your interpretation is technically possible. But in my original proposal I suggested that we should consider “invalid messages” to be a protocol fault for which the penalty is same as in the case of equivocation. If we do accept such semantics, then we need all messages to be signed (including equivocation proofs, and generally - all fault proofs). Basically, for blacklisting a guy we need to know who was the guy.

could you, please, help me to understand what is the difference with a version where new message type is not added. specifically there are 3 equivocating messages (p, q, c), 2 honest nodes (h1 and h2) and b as adversary.

  • h1 receives p. h1 broadcasts p.
  • h2 receives q. h2 broadcasts q.
  • h1 receives q. h1 broadcasts q. b is marked as adversarial and added to some map.
  • h2 receives c. h2 broadcasts c. b - is adversarial.
  • h2 receives p. h2 ignores p. it already has (q, c)
  • h1 receives c. h1 ignores c. it already has (p, q)

(p,q) and (q,c) are equivalent proofs. that were reconstructed without new message type. where is the difference?

OK, so in your approach blacklisting of a bad guy happens AFTER adding to the equivocators registry, and forwarding a message happens before updating the registry. In other words we forward the first “bad” message seen, but block all next bad messages from the same sender.

In my approach, we never forward a message once recognizing a node is bad, but we form a fault proof immediately and we broadcast the proof only.

In principle I think that both solutions should work. Where I am hesitating is that correctness of your solution depends in a subtle way on the way gossip protocol is implemented. The rest of my post is all devoted to explaining the “subtle” part.

All that we we want is that all honest nodes eventually converge on the set of “bad guys” recognized. In my solution this convergence is based on fault proofs - and every such fault proof is just a single message that is never “blocked”, but sometimes just replaced by an equivalent message. In your solution this convergence is based on transmission of messages that WILL be blocked by some nodes. So, the usual invariant of the gossip protocol that “once a honest node A broadcasts a message M, this message will reach every other honest node B within some time T” is no longer true when M is send by an equivocator.

This is a problem. Hence, to prove that your algorithm is correct we no longer can consider the gossip protocol to be a black box which somehow delivers all broadcast messages. Rather, we have to assume some model of how gossip works and the we need to use this assumption to form a proof. Let me attempt to go this path further.

As an example I will assume such a model:

  • nodes form a fixed, directed, strongly connected graph (where an edge represents a direct network connection in the P2P network)
  • every vertex A, on receiving message M via edge E:
    • ignores message M, if M was previously seen
    • otherwise, forwards M via every edge outgoing from A, with the exception of edge E
    • … unless your algorithm of blacklisting is applied

Given this model of gossip protocol I am able to formally prove that your algorithm works OK as long as for every pair of honest nodes A,B there is a path from A to B going via honest nodes.

But whether this model of gossip correctly reflects our real gossip is a more tricky question. I actually doubt this model is good enough - the real gossip works in much more dynamic way, shuffling Kademlia addresses and paths all the time. Of course we can always test the practical efficiency of your approach by doing tests with our production code.

every such fault proof is just a single message that is never “blocked”, but sometimes just replaced by an equivalent message

i think this simpler version can be described the same way. it is just 2 messages that are never blocked, but sometimes are replaced by other 2 equivalent messages. i don’t see how it changes “black box” assumption if there is one

These messages will not travel in pairs. They travel independently.

Basically, your solution creates a feedback loop inside the gossip protocol itself: the more nodes are aware of specific bad guy the more selective throttling they apply at the gossip level in regards to the messages originating from the bad guy. Whether this will work OK in practice depends on the internals of the gossip protocol in use.

In contrary, my solution has one advantage and one disadvantage:

  • adding a new message type is a disadvantage
  • being sure that this will work with ANY gossip protocol is an advantage

Now, let me come back to your solution again. Gossip protocols is a whole galaxy of solutions. For example, here is just one starting point of my exploration:
https://alvaro-videla.com/2015/12/gossip-protocols.html

To illustrate why your solution is dangerous, I managed to construct at least one (very weird) gossip protocol in which it fails. But a much more interesting question here is - will your solution work with any “reasonable” gossip protocol (like the we one we really use in production - via the libp2p library). For this, I think the optimal approach is simulation. I will do some testing and bring the results here, when I am ready.

If you are curious, the “weird” gossip I constructed (which is a correct gossip from the “black box” perspective, but does not work well with your algorithm) is as follows:

  • we assume n nodes in the network, and for math simplicity we assume everyone is connected to everyone; nodes are numbered using [1, n] interval
  • we pick a 1-1 function f: [1, n] --> Perm[1,n] assigning different cyclic permutation of [1, n] interval to every node (cyclic = it is composed of only one cycle of size n)
  • the way broadcast works is: a message m is wrapped into Envelope(m, path, signature), where path ∊ Perm[1,n]
  • the way forwarding works is: a node x on receiving a message Envelope(m, path, signature) checks if he is the sender - if yes, then the message is discarded, otherwise the message is forwarded to path(x)
  • the malicious node runs the attack by abusing the gossip protocol: it first prepares 3 equivocating messages A , B, C but then wraps them into envelopes with DIFFERENT permutations; then, with suitable combination of delivery delays I am able to observe how to protocol locks itself

The locking behaviour is captured in the table below. I assume that the network contains 5 nodes. It shows the history of gossip over time (time flows from left to right). Every row corresponds to the history of one message delivery. So for example you can see that message B was delivered to node 4 in second 4. The "bad guy is node 1.

Blue circle represents the moment of discovering that 1 is an equivocator. Red circle represents the moment when gossiping was blocked because of blacklisting-of-known-equivocators rule.

As you can see, what happens is that first node 2 discovers that 1 is bad (by observing messages A and B), then node 2 discovers that 1 is bad (by observing messages B and C). Then, gossiping of message A gets interrupted and gossiping of message C gets also interrupted. The final outcome is:

  • node 1 is the equivocator
  • nodes 2 and 3 correctly recognized the attack by 1
  • nodes 4 and 5 have no idea that 1 is an attacker (and they will never discover this)

With equivocation proofs, this weird gossip works fine.

Thanks for all the detailed explanations @selfdual-brain!

I agree that we need a special object to represent a malfeasance proof. However, I think the structure that nodes must store should be simply:

maliciousIds: Map[NodeId, MalfeasanceProof]

This structure is shared with the Tortoise and Tortoise Beacon as well, and must be stored forever. It’s in the “consensus”, in the sense ballots are: nodes might not agree exactly on the set of malicious Ids, but if any honest node adds an Id to its set, all other honest nodes will also add that Id within \Delta (the gossip network delay bound).

What’s not in the consensus (and has no reason to be) is the set of malfeasance proofs. Two honest nodes might have different malfeasance proofs for the same node Id. (A malfeasance proof can be anything that proves a node behaved dishonestly; this can include a pair of equivocating messages — for Tortoise, Hare or Beacon — but also a signed message that’s provably flawed in other ways).

Regarding the “ever growing list” — I think a good way to think of it is in the context of ATXs. We keep ATXs forever (we need them to verify the entire mesh). Every malfeasance proof is for an identity that already has a valid ATX, and once a malfeasance proof is generated there can be no more data stored for this identity. The size of this proof is small compared to the ATX size, so the “damage” you can do by forcing everyone to store your malfeasance proof forever is less than by just creating another honest ATX.

The f+1-k Rule

@selfdual-brain noticed an important lacuna in my original formulation of the f+1-k rule: it’s not enough to have k proofs of malfeasance — we must have k proof of malfeasance from eligible parties. However, I think the original rule itself still works. Whenever we have a proof that requires f+1 signatures in the original hare, we will accept a proof with f+1-k (where k is the number of identities for which we have both malfeasance proofs and valid eligilbility proofs).

This means we never rewrite history. If a proof is accepted by an honest party at time t, it will be accepted by all honest parties at any later time. This is because if we start out with f+1-k signatures for any k\ge 0, the only thing that can happen is that x signers that weren’t previously known to be malicious now have malfeasance proofs, so now we have f+1-k-x signatures, but the new k'=k+x, so we remain above the threshold.

Note that we don’t need the honest nodes to agree on which nodes are malicious for this to work — the point is the difference in their signature count is exactly counteracted by the difference in their malfeasance proof count.

In this example, B and C should not recalculate anything. Suppose they calculate their status messages at the time you describe as “Initially”. Then B would send a status message for [2,3,7], supported by a proof consisting of all the preround messages (except m2, which B hasn’t seen). C would send a status message with [2,4,7].

Once these messages are sent, there’s no more recalculation, and the messages will always be valid. In this example f=2, and initially k=0, If D receives B’s status message before learning about A’s equivocation, it will verify that each value in B’s message is supported by f+1-k=3 valid signatures, so will consider it valid. However, if D first receives a malfeasance proof for A (in this case consisting of the pair m1,m2), then D will count only 2 valid signatures for each value, but k=1, so two signatures suffice.

Regarding the “popping up” of the value 5: this can only happen if enough honest parties already have the value 5 so that the adversary could have caused it to be accepted. For example, if E sees the equivocation proof for A before generating a status message, it will include 5 in its status message, because it has two valid signatures for 5. It’s true that neither m1 nor m2 has 5, but since A is malicious it could just as well have generated a message m3 that does contain 5 and sent it to E.

One important thing to note is that for E’s message to always be accepted, it needs to explicitly include the fact that k=1; that is, it needs to include the malfeasance and eligibility proofs for A instead of its signature.

Gossip modifications

As you note, the gossip network can’t blindly gossip malfeasance proofs, since this creates a DoS opportunity. However, I think we can get by with minimal changes to gossip.

IIRC, the current gossip rule goes something like this:

 def receive_object(obj, from_neighbor):
     if (we_already_have(obj)):
        return # Ignore and don't gossip further
     else:
        add obj to storage, gossip to all neighbors except from_neighbor.

The difference will be in checking if we already have the object. Currently, we just check if we have another object whose contents have the same hash. In the modified gossip, we’ll have to use a different method depending on the object type:

  • For most object types, we continue using the object hash.
  • For malfeasance proofs, we check if we already have any malfeasance proof for that nodeId (equivalently, we can think of the hash of the nodeId as the “hash” of the malfeasance proof for comparison purposes).
  • For round eligibility proofs we have a slightly more involved rule (see below)

Note that we need these modifications also for the Tortoise, since malfeasance proofs are global (and the Tortoise has a similar DoS vulnerability if we don’t use “smart” gossip).

Gossiping Round-Eligibility Proofs

Ideally, once a nodeId is in the blacklist, we would be able to just ignore any messages relating to that Id. However, as @selfdual-brain noted, that isn’t possible in all cases, because if any honest party could possibly have accepted a signature from nodeId, we need to know if nodeId was eligible, hence we need to gossip eligibility proofs even for some ids that have been proved malicious. (Note that there’s no need to store these proofs for longer than a Hare instance, since they will never be used afterwards)

Of course, a malicious user can potentially generate multiple different eligibility proofs for the same round, so we need to use the same “smart” gossip technique to prevent this from being a DoS: two eligibility proofs will be considered identical (for gossip purposes) if they are for the same id and the same round (a pair of “gossip equivalent” eligibility proofs with different contents can also form a malfeasance proof.) I think this works even if the eligibility proof is simply part of the preround message (in which case this gossip logic applies to the entire preround message).

Completely dropping Ids

The argument above only applies to Ids that honest parties might accept.
For Ids whose malfeasance was proved far enough in the past that no honest party would accept their signature, we can simply discard all messages.

How far in the past requires a bit of care, because honest parties don’t have consensus on when an Id was canceled. I think the following rules would work:

  • If the first malfeasance proof for Id was received more than one round before the beginning of the Hare instance, honest parties don’t use this Id when generating their own messages (i.e., even if an eligibility proof is received for this id, it won’t count as part of the ‘k’ for messages the honest node generates).
  • If the first malfeasance proof for Id was received more than one layer before the beginning of the Hare instance, honest parties will completely drop messages from this Id (and won’t accept proofs that rely on it as an eligible malicious party).
  • In between (first malfeasance proof was received more than \Delta before the start, but less than a full layer), the parties will still gossip eligibility proofs and will accept proofs from other parties that rely on them, but won’t use them themselves.

The idea is that if the first malfeasance proof was received by any honest party more than one layer in the past, it will have been received by all honest parties more than \Delta in the past, so there’s no risk of rejecting an honestly generated proof.

Hi @selfdual-brain I just want to make a point about your weird gossip example.

Lets consider the case of one valid gossip message and apply it to the above weird gossip model. Imagine that messages A and C above are in fact just one message that is received at the same time (time 1) by, nodes 2 and 3, 2 and 3 will then proceed to gossip the message to each other, but neither will gossip it further, since they just gossiped it. This is just the enforcement of the rule that you re-gossip the first time you receive a message.

In your example A and C can be considered as one message, because all we need to ensure is that one or other reaches the whole network.

I think the two examples are equivalent and so the weird gossip example is just showing a situation that can occur in any gossip network whereby message propagation can get blocked. Gossip networks overcome this by selecting the number of peers to gossip to appropriately and also periodically re-gossiping messages they have sent.

So I think it is safe to use the approach described by @dmitry.