We now have a complete development epic for an initial implementation of protocol versioning, which also provides cross-chain replay protection. We’re planning to start by implementing only a single version identifier, called GENESISID, that’s unique for each chain and will never change. (Later, we’ll add other identifiers that will change with each upgrade. See this SMIP.)
We initially discussed how to include the version ID every time we hash a message or data structure. This would allow us to implicitly include the version ID in every signed message in “zero cost” fashion (i.e., no mesh bloat).
While working on the dev spec, I noticed that we don’t, in fact, hash every data structure before signing/broadcasting it. (Here is one such example, where we sign the entire hare message, rather than a hash.) We either need to change to signing a hash of each data structure, and including the version ID in the hash as originally planned, or else we need a different approach.
I see three different “levels” where this binding could happen:
the node ID itself, which is based on a public key. Rather than just using the pubkey, we could set the node ID to the hash of the pubkey + GENESISID. (This would prevent a node ID from being used on two networks, but the signers use the keypair directly so it would still have to be included there.)
the signers. We have two of these, one for regular signatures, and one for VRF. We could hash the GENESISID into each message to be signed.
the hash function itself (Subject to the caveat I described above)
Rather than performing this binding at the level of the hash function, it seems the right place to include it is in the signers. Does this have any downstream implications on protocol design or security?
Note: we also need to make sure that GENESISID is bound into the PoST proof. This should flow directly from the ID and/or signer, but right now I think neither are included, which seems wrong. We need to double check this in the code.
It seems like the simplest possible version of this is to bind only two things:
PoST data. This should be bound to a particular GENESISID so that it cannot be used on any other chain. Everything else related to smeshing (ATX, ballot, proposal, block, Hare message, beacon) is downstream of this.
Transactions. These are not generated by smeshers, so we still need to bind these separately to prevent cross-chain replay attacks.
That sounds basically right, as long as we don’t use protocol version IDs. When we add protocol versions, the PoST data should still contain only the GENESISID, while the ATXs, ballots, etc. should have the protocol versions as well.
Even with just the GENESISID, I think we also need to add gossip messages. These aren’t downstream from PoST data, and we should make sure our peers are using the same network. Gossip might be a bit trickier, since it’s possible that we want the same nodes to gossip for both devnet and mainnet, for example, so we might want to think a bit more about this.
I think having a “Versioned Signature” with the same API as regular signatures but that internally prepends the ID to the message should work to prevent any attack I can think of offhand. It might not be enough for a good UX though – you can’t tell if the signature verification failed because the GENESISID was different or because of something else.
@talm do we also need to bind GENESISID to the nodeID/smesherID? Or is it sufficient to bind only the PoST data (i.e., make initial PoST proof bind to golden ATX that’s unique for a given GENESISID)? Is there any risk that, if we do the latter without doing the former, a smesher could be active on multiple networks (e.g., equivocation) with no cost?
IIRC, we need to bind the smesherID to the PoST data as part of our POPS-VRF solution — the ID includes both the VRF key and a nonce that’s related to the PoST proof (actually computed as part of the PoST initialization, so that we get very little overhead for the PoW). Am I remembering incorrectly?
Unfortunately, although there’splentyof discussion, I couldn’t find a summary of what our final POPS-VRF protocol is — there are several variants and I don’t remember what we decided on :-(. Did we actually implement it in code?