Compact encoding for hare

In an attempt to decrease protocol overhead when communicating proposals in the preround stage, a suggestion has been made to encode the proposal IDs (currently 20-byte blake3 proposal IDs) in a compacted way such that a receiver may attempt to reconstruct the original message using the provided hints, then try to recover the signature provided in the message.

The suggestion made by Dmitry (click) was that a compact encoding be done using SipHash (or HalfSipHash) to compact the proposal IDs. The message signature would be generated using the full proposal IDs prior to compaction with SipHash. The message validation flow could be described as follows:

  • A receiver would attempt to reconstruct the message from the compacted proposals.
  • A receiver could ask for clarifications from the sender. These subsequent requests are done on a separate 1-1 stateful channel between the peers (not over pubsub).
  • The sender would then respond with the full proposal IDs that were requested to disambiguate back to the receiver.
  • After a round of possible clarifications the receiver attempts to recover the signature of the message from the content and compares it to the message.
  • If the signature validation fails, we again ask the sender for a full clarification, resulting in a message with the full proposal IDs (untruncated 20-byte blake3 IDs).
  • The receiver attempts a second signature validation
  • If the signature validation passes, the message is cached (as to help peers that would be gossiped the same message to inquire about it further down the pubsub cascade)
  • Message gossiping continues, with the cached messages evicted within 2 layers

The message exchange (worst case) can be described by the following sequence diagram:

Having already implemented this proposal, the following points should be taken into account:

  • We don’t know which collisions are going to happen, because different peers may have different views of proposals in the network
  • Therefore a collision (shared prefix of a shortened proposal ID) at the sender does not necessarily mean a collision at the receiver (and vice versa, see next)
  • The receiver may have a different proposal that collides with a proposal that the sender does not have (sender has only one proposal that starts with 0x1b but receiver has two proposals 0x1b00 and 0x1b01). Therefore, the sender may provide further data about proposals that share prefixes preemptively however that does not guarantee that this information is of any use to the receiver (since the actual proposals are communicated off-band, not with the preround message).
  • The interactive approach is flawed because it may open the protocol up to more attacks (i.e. senders that spam the receivers and send them into unnecessary rounds and slowing their processing of messages)

After discussing the matter with @iddo yesterday, we’ve converged on the following approach for now:

  • Remove the interactive part completely - no clarification rounds are allowed
  • Sender constructs the message by compacting the original proposal IDs (truncates them to 2 bytes)
  • For proposals that share prefixes, we add further 4 bytes (implementation details are orthogonal here)
  • The message will be signed just as it is now - with the original full proposal IDs, and sent without the full IDs (just compacts)
  • If a downstream (receiver) peer receives the message and cannot reconstruct the original (i.e. do the magic and verify the signature), it does NOT gossip the message to other peers
  • No message caching is needed in this approach

Some thoughts and concerns I have with this approach are the following:

  • If the consensus about the proposals themselves is not strong, we may have a difficulty to guarantee that messages get forwarded as hops increase in the network.
  • This may increase splits in the view of the network
  • Some network topologies may resolve this
  • This approach still doesn’t resolve the problem that the sender does not know which proposal prefixes may collide at the receiver, causing it to never be able to recover the signature
  • Today, a situation where the signature cannot be recovered is technically not possible in the preround, and I suspect that allowing for a situation where gossip cascades may be interrupted can be dangerous (we don’t have a way to simulate the real network conditions we’re facing)
  • Perhaps, as a first phase, we could go with a compaction to more than 2-bytes (maybe 6-bytes with possible extra info in case a collision is found) and see if we see any problems in achieving concensus
  • Later on we can further lower the size of compacted prefixes and see how that looks
  • We may opt-in to test this on mainnet between our own nodes that participate in hare and collect metrics on how often collisions happen (only they run the further-compacted-hare variation and use that in parallel)
1 Like

Thinking a bit more about our discussion, we focused only on the potential for malicious proposers to generate collisions. The simple solution we came up with solves this, but there could potentially be a consensus failure due to a malicious Hare preround participant (as @acud notes, this is because Hare participants don’t see the same set of preround messages, breaking our gossip guarantees).

One way to solve this is to ensure there is graded agreement about the Hare preround messages. To use the existing Hare protocol as a black box, we need a 5-grading scheme (i.e., each preround messagel will get a “compression grade” from 0 to 5). I think this is possible, but it is a bit tricky, and might require a long delay (e.g., something like 11 gossip rounds) between the layer boundary and the start of the Hare preround.

However, I think we might be able to solve it more efficiently by relying on how gossip works (or at least, how I hope it does :slight_smile: )

The idea is based on the following rules:

Gossip rules

  • If we receive colliding proposals, we always gossip the first proposal we received to all neighbors before gossiping the rest.
  • We always gossip proposals before we gossip the preround message that references them.

(note that by “we gossip z” I mean that we ensure our peer knows z – if our peer learned z from another source that’s fine).

On-the-fly Disambiguation

  • We keep a table mapping each compressed ID to a boolean value reflecting whether or not we’ve seen a collision for that ID.
  • When we gossip a preround message, we add disambiguation to every ID for which we’ve seen a collision (whether or not it appeared in the message sent by our peer).

Decompression rule:

  • We keep a table per-peer that maps the compressed ID of the first proposal with that id that was gossiped by the peer to the full proposal id.
  • When we receive a preround message from that peer, we disambiguate using the table. If the disambiguation fails, we drop the message (and the peer).

Analysis

Suppose preround message X has proposals with compressed IDs (a,b,c). If my peer sent X to me without disambiguations, it means it hasn’t received a collision for a, b, or c yet (or is dishonest).

Let’s consider how an honest peer could have learned a:

  • case 1: the peer received a from me. In this case, I must have received a before any colliding proposal (since I would have gossiped the colliding proposal to the peer first).
  • case 2: the peer received a from another source. In this case, it must have gossiped a to me before the preround message, and moreover this must be the first proposal I received from my peer with compressed ID a (otherwise it would have to be aware of a collision).

So the decompression rule must succeed if the peer is honest.

I also added some notes here if that helps compact encoding for hare preround message · Issue #5606 · spacemeshos/go-spacemesh · GitHub

Thanks for your reply @talm. I’ll dissect it to make sure I understand (with further comments).

According to what I understand about the gossipsub implementation we’re using, is that in theory the same message can be delivered more than once from the topic mesh overlay, however, the validators would not be triggered twice for the same message (from go-libp2p-pubsub/validation.go):

	// we can mark the message as seen now that we have verified the signature
	// and avoid invoking user validators more than once
	id := v.p.idGen.ID(msg)
	if !v.p.markSeen(id) {
		v.tracer.DuplicateMessage(msg)
		return nil
	} else {
		v.tracer.ValidateMessage(msg)
	}

This means that when we get a proposal and we try to gossip it (implicitly) to another peer which has already received it, we must register to the gossipsub tracer to get a message about a duplicate message (since the validator will not be invoked twice for the same message).

I suspect that even if we do that, it might not be a strong enough assurance that we actually do get duplicate messages. Since the aforementioned assumption (we always gossip proposals before sending the preround message about them) infers that we must indeed expect and rely on the duplicate messages deliveries. However it is my understanding that the gossipsub implementation tries to actually avoid that, because it aims to optimize and capitalize on (bounded) flooding.

To conclude: it might be problematic IMHO to rely on this property of gossiping (and it may subjected to changes that may increase vulenrability of our consensus in case that libp2p changes something in their routing heuristic).

I suppose this means either encapsulating the original message in another that would decorate it with further information about the local collisions. Does this information need to be signed too? Should rules that make sure that the forwarder isn’t just trying to confuse the validator? Also, we currently don’t mutate a message which is already in the validation pipeline. We could do that, but I’m not sure that’s the best way of going about it. It basically means encapsulating the original message in another if I understand correctly. It is also not clear whether peers further down the forwarding chain would/should be allowed to do further modifications to this message. Also, it might cause “duplicate” message deliveries for the same preround message, since it would mutate in different ways along different edges of the address space graph (on forwarding) and the same neighborhood may receive different flavors of the message with different annotations, causing different message IDs to be generated for it.

In the scope of resolving the different conflicts, I’d like to suggest perhaps a different approach.
The main problem with compacting the IDs is the collisions IMO. Let’s assume that I have a prefix that I don’t know how to match locally (because I may have multiple proposals that compact to this ID). So how about the following:

  • Sender truncates the original 20-byte hash to n byte hash (simple truncate).
  • Sender then creates a further k bytes that are a simple and cheap operation to perform over the data.
  • Sender adds these k bytes to every entry in the compact ID list
  • Assuming that k bytes are very cheap to create, the receiver assumes that brute-forcing the proposals it knows about in order to disambiguate the colliding proposals is cheap, and therefore it would re-calculate (and potentially brute-force) the k bytes for every compact ID it received from the other peer, thereby hopefully reaching an agreement about the parity data.
  • Today this is not possible since the EC signature is the only way we know if we’re right or wrong on reconstructing the message. Having (cheap) parity data about every proposal can allow us to do more cheap guessing about the content before getting to the signature verification part.

Example of parity data:
The sender of the proposal takes the first 5 bytes of the 20 byte ID, and XORs it with the next 5 bytes, etc, till the whole hash is XOR-d, resulting in a short parity information that can be added to the original message:
Original 20-byte hash: 0x28bfb59a2262a7b034bdaf3024e55bb75b33cce8
Parity information: xor(0x28bfb59a22,0x62a7b034bd,0xaf3024e55b,0xb75b33cce8)=0x527312872c
From here, we truncate the data to 2-byte prefix 0x5273.

This parity data is then added to the original shortened prefix and is deconstructed when the message received on the other end, allowing the receiver to use the parity data to brute force the existing proposal the other node knows about. My assumption is that here we don’t need to rely on gossipsub properties (which are subject to changes), and that the parity data is less susceptible to collisions.

Other parity data schemes are welcome and maybe even more favorable, for example XOR(OriginalHash,VRF), however I must say that I still don’t have all the possible attacks visualized (nor can I factually reason about the cryptographic benefits behind each alternative - that I leave to you).

With this scheme the edge case where a receiver node does not at all know about a given proposal is not solved.

Would be happy to hear your opinions about this.

EDIT: if the parity data is only generated from XORing the hash itself (no VRF) it can be generated once for every proposal and compared against it when reconstructing the original message. It thereby allows us to more accurately “fish-out” the right proposal that should be in the reconstructed message. It gives more probability for an accurate guess and in case we cannot reconstruct the original message (and thereby dropping the message/peer) we can expect a less harmful consequence to achieving consensus.

By gossip order, I don’t mean we have to send duplicate messages. We just need to make sure our peer knows about the message. That is, if I receive a proposal X and after that a preround message Y that refers to X, I will never send my peer Y before ensuring that the peer already knows about X. (If our peer already received Y from a different source, we don’t need to satisfy the dependencies, because it already correctly decoded Y).

You’re right that this requires assumptions about gossip, that might not be satisfied by arbitrary gossip protocols (e.g., if gossip works by sending a different subset of messages to each peer, rather than flooding all peers).

However, if our current gossip doesn’t support this it might be worth spending some dev time to add this feature – this kind of guaranteed dependency ordering can make it easier to make things more efficient and prevent DoS in lots of other cases too (I can’t remember specific examples offhand, but I remember we did have previous discussions where it came up).

The auxiliary disambiguation information doesn’t need to be signed, which is why it can be updated as it travels through the gossip network.

Every peer along the forwarding chain has the same logic, so they can all potentially add disambiguations.

It shouldn’t cause duplicate deliveries. This should use the same mechanism we use to prevent duplicate deliveries of malfeasance proofs, even though there may be multiple different proofs for the same identity. The gossip layer should already support something more general than “hash the entire message” to check whether a message was already received.

Regarding your approach – I don’t think it’s a direction we want to take. Even if the “checksum” is cheap, an adversary can potentially force a node to make an exponential number of tests (exponential in the number of different colliding IDs). Even ignoring the code complexity of having to write the exhaustive search, and the fact that a checksum that’s not cryptographic can be attacked directly by the adversary — consider the case where there are 60 colliding IDs, and the preround message has a bad signature. The verifier would have to try all 2^{60} options before it concludes the signature is bad: even if a single test takes 1 picosecond, checking all the possibilities would take almost two weeks.

This infers an interactive protocol which would either extend the existing gossipsub protocol. If I understand correctly, it would mean we should have some sort of syncing protocol that would sync proposals, rather than rely on gossipsub to convey the messages. Having a syncing protocol could allow us to use more efficient data structure to determine sync-progress (e.g. have a modified prefix trie to sync the necessary proposals between peers) to achieve agreement about prior knowledge of proposals.

We have a protocol that fetches proposals right now, but it doesn’t necessarily ensure that we have the same view over the network (it is more of a “pull” rather than the “push” approach you’re suggesting), it just makes sure that the proposals needed to process the hare output are requested from peers. Perhaps we could think about changing it (or extending it) to be able to request prefixes (or sync them). I just suspect that we might need another window before the preround to make sure that the protocol does what it is supposed to (and so that it doesn’t undermine the existing gossipsub behavior). This would allow peers to agree over the proposals before the preround (now this happens after the preround if I understand the code correctly).

Also, this being said, having more protocols and more messages might shoot us in the foot with trying to eliminate the original overhead (which is what this proposal was originally about). I wonder if we have any idea on how to approach this (with sane limits - it probably needs some calculation on what would be reasonable defaults in terms of expected collisions/retrievals etc)

Here I’d be careful still, because we probably don’t want to turn into a p2p-library shop. We could fork the libp2p pubsub but we’d then lose the guarantees that we have with the amount of research and manpower they invest into making this protocol do what it is intended to do. I’d personally rather build protocols over a black-box infrastructure that is assumed to be working rather than start from scratch and have to go through extensive birth-pangs that can just be mitigated with simple-enough wire protocols that could be reasoned about.

Can’t nodes tamper with this information to mislead nodes further down the forwarding chain then? Or cause further grieving through bloating the messages to the degree it burdens the network with even more data that it needs to pass down?

Fair, we could just check whether the internal message (the encapsulated one) was handled (on the application layer code, not the gossipsub one, because that one would definitely not work for this as it works on the whole message hashing). With the current gossipsub protocol there would be definitely duplicate routing and delivery that would have to be prevented from being double-processed on the protocol level (application code).

Thanks for explaining and the consideration.

1 Like

I thought the gossip was already slightly interactive? Don’t you first send message IDs, and only if they are not known send the entire message? Or is part of that handled in the layer above gossip?

I think what we want is to have gossip support enforced gossip order. This is a very general primitive, and a feature that could be useful for other projects as well, so ideally we wouldn’t fork libp2p but just contribute this feature.

At a high level, I’d guess it would require something like the following:

  • Gossip stores a per-peer queue of messages that should be sent to that peer.
  • Messages in the queue are always sent in order. The layer above gossip can sort the queue (in our case, we would do a topological sort based on dependency between messages)
  • When we receive a message from a peer A:
    • we remove all equivalent messages from A's queue (equivalent can be a custom test that checks something other than just message hash)
    • if this is a new message (not equivalent to any message previously received), we add it to the queues of all the other peers.
  • When we connect to a new peer at time t, we don’t start gossip until we finish an interactive sync so we know we have the same messages up to time t. While we’re syncing, we can enqueue messages for the new peer that were received after time t, and treat new messages learned through sync as if they were gossiped to us (e.g., remove them from the new peer’s queue if they were there, and add them to other peers’ queues if the message was new).

This doesn’t sound very complicated, but I’m not familiar at all with libp2p internals, so I don’t know how hard this is to implement. (Also, there may be some edge cases I’m missing).

It can’t mislead, because if you receive a message from a peer and it doesn’t validate, you know the peer is misbehaving and drop both message and peer.

It can bloat if we don’t verify disambiguated proposals, but if that becomes a problem we can just check that every disambiguated proposal actually does have a collision, and remove the disambiguation (and peer) otherwise.

I think we could do that (wasn’t this what @dmitry suggested originally?) but as you say, it adds potential delays to every connection, so the first few gossip rounds would probably have to be longer.

I think the idea would be to “optimistically” run the protocol I suggested above, but without gossip guarantees. In this case, it’s possible that decompression fails even if the peer is honest. Instead of dropping the message. we ask the peer interactively for a full disambiguation (i.e., the full IDs of all the included proposals — this shouldn’t be too horrible on average, since this would hopefully happen rarely). If we can now verify the preround message, we continue to gossip the message with additional disambiguation (since we must have learned of new colliding IDs in this process). If it fails, we drop the peer and the message.

Yes but the message IDs are internal to libp2p (they generate them from the wrapped messages which are encoded as protobufs), right now it is does not allow for a pluggable implementation.

So the problem IMO is the mesh delivery part. Preserving ordering in a distributed system would require something like vector clocks or Lamport timestamps. The part of having enforced ordering is nice to have indeed, I agree, but I wouldn’t say it’s trivial. Even the per-peer queue definition deviates to a large extent from the design of the current gossipsub implementation. And due to the nature of the mesh delivery, sending can be ordered, but it might incur a greater overhead on the network because sending can be done in parallel from multiple peers (so you have to continuously be syncing metadata in order to prevent double sending of messages, otherwise you just end-up with a floodsub implementation masquerading to be something smarter).

Dmitry suggested that peers could be able to resolve prefixes once unable to resolve prefixes, with a possible fall-back into a full proposal resolution (aka send everything like we do today). I think that the part of resolving individual conflicts is possible however adds further possible rounds to the protocol. We could just settle on what I’ve already implemented without this part of individual prefix resolution with a fast fallback to a full message exchange. It is simple enough and simplifies the protocol (the individual prefix resolutions are more succinct but complicate the implementation and message semantics). I would suggest to do that on just the shortened blake3 hashes without the SipHash part and just always fall-back to a full exchange and see how that works. We might want to lengthen the preround as a result.

This can be added to the current implementation that I already have.

@talm I have the following recommendations on how to proceed with this, let me know how this sounds.

  1. Take the already existing implementation that I did to the original proposal that Dmitry made and further simplify it:
  • Only fallback to a full exchange of the whole list of proposals
  • Remove the possibility of inquiring about single prefixes
  • Shorten the blake3 hashes to 4 or 5 byte prefixes, no SipHash
  • Lengthen the preround a bit so that we could accommodate for possible further latencies
  1. Once we have this change in and deployed on the mainnet, we could check:
  • How often do we fallback to a full exchange
  • How many collisions we usually have
  • How long does the exchange take, how big is the impact on the preround length
  1. Once we have this data, we could proceed and further implement the changes that you suggested with instrumenting the message with further data optimistically with collisions that the forwarder finds locally (we may also learn new things in the process, allowing us to make better and more knowledgeable design decisions based on empiric data). I find this as an optimization that can be built on top of the other, allowing us to gradually introduce changes to the protocol and not land huge redesigns that are harder to reason about and peer review (by engineers, rather than researchers).

How does this sound?

1 Like

We don’t need “mesh delivery”. Or, at any rate, we don’t need anything that preserves ordering globally – only ordering per channel, which doesn’t require any special protocol.

You always have to be syncing “metadata” anyway, to ensure your peers saw the same messages you did. Unless the gossip is doing something very different from my mental model, if you receive a message X from peer 1, then you should ensure peers 2,…,n also have message X – is that not the case?

I think making our gossip layer actually work like we want is the “correct” solution, and worth it in the long run. But if we need a quick-and-dirty solution for now, I think the interactive solution I proposed above is a bit simpler and saves a round of communication in the worst case compared to Dmitry’s solution (which, looking at the diagram, seems to have three rounds in the worst case).

I’m talking about the protocol where all preround messages are sent fully compressed (i.e., with short IDs for every proposal), and if the receiving node can’t verify the signature, it asks for the full list of uncompressed proposal IDs. If it still can’t verify the signature, it drops the message (and the peer).

It’s possible that this is what you mean by

in which case we’re on the same page.

I think 4-byte prefixes are probably enough (the probability of a collision among 1000 honest proposals is less than 0.01%). However, we should implement @noam’s suggestion of using a prefix of the VRF output rather than the proposal hash, since this makes it harder to attack the protocol and cause it to fall back to the slow case by forcing collisions.

It’s not enough to lengthen the preround, since the Hare protocol continues to gossip preround messages for at least five rounds afterwards. I think lengthening the first five rounds should be enough (although I’m not 100% sure).

Working from empirical data is fine for most algorithms, but it can’t give you information about what happens during an adversarial attack (since the adversary won’t be attacking while you’re trying to measure).

In our case, it’s critical that we can handle malicious adversaries that are trying to take down the network, since sooner or later we will be dealing with some.

Yes. Just to reiterate for the sake of clarity in a form of a diagram (worst case path):

If we can’t reconstruct the message we just fail-fast and ask for the full list of all 20-byte blake3 hashes.

If the compact message is reconstructed, no message of full IDs is needed.

Can you please elaborate on how to do this, I’m not sure I fully understand what is meant by this.

Good, I think we agree then. There’s one more thing I think we should do – if you already know there is a collision, send the full list instead of the compressed one, since this will save a round that is sure to follow otherwise.

The idea is that instead of encoding a proposal by a prefix of the content hash, you encode it by a prefix of the VRF output used in its eligibility proof. The eligibility proof consists of the VRF evaluated on the layer number and beacon, and is not affected by the contents of the proposal. Thus, in order to purposefully cause a collision, the adversary would have to grind on their identity (or double-sign) rather than just pick different transactions to include.

In any case, you would be keeping a map from the compressed identifiers to the full ID, so it shouldn’t be more difficult to implement.

Each node generates an eligibility proof for its proposal. This proof is a VRF output. The idea was to use a prefix of the VRF output as the compact ID instead of the proposal hash (for example 2 bytes or 4 bytes). The node attaches the VRF proof to the proposal and stores the full VRF output. The node then gossips the proposal with its compact ID and the VRF proof.

Just a side note, regarding this part

This infers an interactive protocol which would either extend the existing gossipsub protocol. If I understand correctly, it would mean we should have some sort of syncing protocol that would sync proposals, rather than rely on gossipsub to convey the messages. Having a syncing protocol could allow us to use more efficient data structure to determine sync-progress (e.g. have a modified prefix trie to sync the necessary proposals between peers) to achieve agreement about prior knowledge of proposals.

just wanted to mention that it sounds quite like syncv2 stuff that in the works (for a rather long time, but hopefully will be reviewable soon): Implement set reconciliation based sync (syncv2) by ivan4th · Pull Request #5769 · spacemeshos/go-spacemesh · GitHub There’s also some not-yet-in-PR code here (due to temporary complications related to a dependency on a not-yet-merged PR): Commits · spacemeshos/go-spacemesh · GitHub where I switched from using a red-black tree to a prefix trie, more precisely a binary radix tree, for the sync b/c for hash-based IDs, there’s no real need for balancing
The idea for syncv2 is to have it work alongside with the gossip to retrieve any missing data quickly from the peers, effectively fixing any issues stemming from nodes having different views of the state. The protocol is interactive and is based on exchange on shorter fingerprints+counts for ID subranges which are recursively subdivided. Perhaps it could help with syncing gossiped proposals too, although I have doubts regarding the timing, whether it’ll be quick enough for hare31.

1 Like

I haven’t read this entire thread, but I just want to add that I think that we should require that proposal IDs in the pre-round message are sorted. This basically means that we can even include two ID prefixes that are the same and the recipient will know the correct order, as long as there are no known excluded proposals with the same prefix. Even if there are such proposals, this significantly reduces the possible number of permutations if the recipient wants to brute force it. If there’s a small number of possible permutations, isn’t it worth attempting them before asking for full IDs?

So my point is that we should only preemptively include full IDs for a small subset of the proposals in the pre-round message, just enough to make sure that there are no collisions with proposals that are excluded (the order of included proposals is strictly defined).

(I said IDs here, but it works the same way with eligibility proof prefixes if we go that path)

I have another optimization, that might require updates to the p2p stack (haven’t fully ready Ivan’s message - it might be related):

I assume that most pre-round messages will point to the exact same set of proposals. If we send a pre-round message after already sending a different pre-round message on the same link (from us to the same remote peer) that has the same set of proposals - we could reference that previous pre-round message instead of sending the hint again. This could result in massive bandwidth savings, as well as quicker (basically zero) reconstruction of the set of proposals.

Asking @ivan4th and @acud : how feasible is it to implement this? It sounds hard to me - is it?

@talm @Amira the changes are already in a PR. I have two questions about this now:

  1. Can we already piggy-back a raise to the committee size again with this change? Right now 50 and I’d like to raise it back to 800 or to at least 400 WDYT?
  2. What about increasing the preround length? If so, by how long?

Thanks!