Achieving consensus on epoch participants collection


Historically, Spacemesh design was created in opposition to proof-of-stake tradition: it is enough to buy storage, initialize it, form a proof-of-spacetime and announce this proof as an ATX to become a member of an epoch. This way the participation in Spacemesh blockchain is supposed to be “fundamentally decentralized” and resistant to censorship.

Unfortunately this pushed us into problems - which can be summarized as follows:

  • although Tortoise can live with “approximate” consensus of who is the member of an epoch …
  • … this is not that easy in the case of Hare and rewards mechanics: in both we actually need STRICT consensus on the set of epoch participants; with the lack of such consensus strange things happen and this strange things can be pretty easily triggered by equivocating at ATX level

I am creating this thread to bring together overall discussion around fixing this problem. In other words I would like us to bring ideas on introducing ATX-level consensus in Spacemesh blockchain so that eventually we can pick the best one.

I am using the term “smeshers(n)” as a shortcut of “the agreed collection of Spacemesh blockchain identities participating in epoch n”, where “agreed” means of course “agreed by running a specific consensus protocol” - this consensus protocol is currently not existing and we are trying to introduce one.

Let me start by throwing 3 random ideas by myself.

Idea 1: Reuse existing consensus

This solution basically clones the pattern from proof-of-stake way of dealing with epoch members: introduce bonding transactions.

The way to do it is:

  • introduce special "“bonding” transaction: such a transaction would read as a request of a smesher S to participate in a given (future) epoch E
  • a bonding transaction would include a relevant ATX in its body (or just the hash of the ATX)
  • the data structure with all the epochs and their participants would be part of the global state

Whether or not the bonding transaction has a principal and therefore is subject to normal payment of the gas is a design decision to make. I think both options have its pros and cons.

Naturally, bonding transactions would be flowing-in constantly. Smeshers(e) would be then “frozen” at layer firstLayerOf(e) - K, where K is a protocol parameter (I would suggest K=864, which corresponds to 3 days). The K distance is there to ensure that that the finalization of smeshers(e) is strong enough at the moment of epoch beginning.

The beauty of this solution is that we simply reuse already established consensus mechanism to get a consensus on ATX list, so we do not introduce anything fundamentally new.

Idea 2: Parallel blockchains

This idea can be seen as a subtle refinement of the previous one.

Long time ago we decided that Spacemesh blocks - contrary to Ethereum - do not contain a pointer to parent block. This design decision allows for some more flexibility and performance optimizations (Hare operating independently from Tortoise and consensus-execution decoupling). On the other hand, the price to pay is dealing with “reorder” phenomenon and some other complexities.

For clarity, let me introduce short names for these two competing patterns of blockchain structure:

  • detached-blocks: is the pattern where blocks do not have pointers to parents (this is the pattern we use in Spacemesh)
  • blocks-tree: is the pattern where every block (with the exception of Genesis block) has a pointer to the parent block; this is a pattern used in Ethereum and several other blockchains

Our current Spacemesh design is based on having 2 blockchains which run in parallel on the same P2P network:

  • random beacon - a simplistic blockchain existing as shared randomness source
  • transactions-chain - a “detached-blocks” blockchain based on Hare+Tortoise machinery we use to achieve consensus on transactions order

The new design I propose would be:

  • random beacon - a simplistic blockchain existing as shared randomness source
  • ATX-chain - a “blocks-tree” blockchain - just to achieve consensus on smeshers(n)
  • transactions-chain - a “detached-blocks” blockchain based on Hare+Tortoise machinery we use to achieve consensus on transactions order

The ATX-chain would be just yet another independent blockchain, reusing the already established Hare and Tortoise machinery as-is, with the following differences:

  • there is no consensus-execution decoupling: blocks in ATX-chain include the hash of global state after executing this block (same as in Ethereum)
  • the execution of an ATX (seen as a transaction in this blockchain) results in creating a corresponding entry in the members-in-epochs-map; similarly to “Idea #1” we want to maintain the global state which contains this epoch-member-weight data structure
  • ATX-chain has far less layers than the transactions-chain; I suggest 1 block per 12 hours
  • both ATX-chain and transactions-chain have 14 days epochs, but with suitable translation: ATX epoch 42 should end K days before transactions-chain epoch 42 (say K=3)

Let me say something more about the execution of transactions and the structure of the global state in ATX-chain:

  • In this blockchain there is no gas and no principals
  • A transaction is just ATX
  • publishing more than one ATX targeting the same epoch is considered a malfeasance (with the normal malfeasance handling)
  • global state contains a structure: `members: Map[Epoch → Map[SmesherId->ATX]]
  • execution of a transaction results in adding one entry in members
  • entries in members are never overwritten - because only a duplicated ATX would do so, but they must be purged by block formation procedure

Now comes the trick with how ATX-chain defines smeshers(n). Please notice that now we have two blockchains (ATX-chain and TX-chain) with independent epochs (and shifted by K days). However we will define things so that members of epoch N are the same in both blockchains.

The key trick here is that - thanks to blocks-tree - ATX-chain is able to recursively define smeshers(n) for itself: we define smeshers(n) as members(i) where i is the last layer of epoch n-1 in ATX-chain. It other words:

The collection of participants of epoch N is achieved by freezing the snapshot of the members registry as it was in last block of epoch N-1

Hence, immediately as a new epoch starts, we know who are members of this new epoch. Why this is working without waiting for the consensus to happen ? Well, because the underlying tree structure of blocks ensures that WHATEVER path of blocks will be finally selected as the finalized consensus, every path determines smeshers(n) for all n values.

Now, the last thing to do is to propagate the consensus on smeshers(n) to TX-chain. In this case we HAVE TO wait for the consensus to happen, so this is why we need the shift between ATX-chain epochs and TX-chain epochs. We simply define smeshers(n) in TX-chain to be the same collection as smeshers(n) in ATX-chain (but the epoch n in TX-chain will of course start K days after the same epoch in ATX-chain, hence by that time we hope to have good-enough finalization of last block in layer n-1 of ATX-chain).

Idea 3: Red and blue marbles

This solution would work well only if we are able to shape incentives and overall Spacemesh design so to observe approximately constant flow of ATX-es across the timeline of an epoch.

Assuming 1 layer every 5 minutes, we have 288 layers per day. Assuming 1 epoch = 14 days we have 4032 layers per epoch. If the network size is 100k nodes we have ~25 new ATXes showing up per layer.

The idea is to reuse already established infrastructure of Hare/Tortoise. Hare establishes consensus on collections, where elements of collections can be “anything”. Say elements of the collections in Hare are “marbles”. So far we had marbles to be proposals. In the new solution we will have:

  • blue marbles: they are proposals (as before)
  • red marbles: they are ATX’s

So, blue and red marbles are mixed up in the same collection, and as we run Hare, the consensus result will be a collection composed of blue and red marbles. Then, we build the unified block with blue and red marbles: blue marbles add up to form the “transactional” part, while red marbles add up to form the “ATX” part.

The ATX parts of all block are considered as registry of incoming ATX’s. The deadline for providing an ATX for epoch N should be some K layers before the beginning of epoch N (say … 3 days). Then, smeshers(N) is formed up as a sum of ATX parts of blocks up to the “deadline” layer. Of course nodes mush also manage the buffer of ATX’s they see, which did not make it into the blocks yet. This is analogous to the mempool on fact.

Or, should I say, the whole solution is pretty much a copy of “Idea 1”, just being more verbose in the way we represent global state and showing that “Idea 1” can be implemented with relatively minimal changes to the existing codebase.

1 Like

Thanks Wojtek. Lots of good food for thought. I’m looking forward to discussing this on the call tomorrow. Just a few quick preliminary thoughts:

  • I don’t think that the goals of permissionlessness and strict consensus are at odds. The key thing for permissionlessness is that ATXs are free. They don’t need to come from an account (principal) that already has coins. (Censorship resistance is a more nuanced topic and we can discuss it later.) This doesn’t prevent them from being included in proposals or blocks or subject to consensus. (As we discussed before there should be a fee associated with an ATX since it does occupy space and involve processing, but this fee can be offset against a smesher’s rewards.)

  • There’s no need to create a new class of “bonding” transaction. This is precisely what an ATX is. Rather than saying “a bonding transaction includes an ATX”, you should say, “the ATX includes a NiPoST proof.”

  • Your parallel blockchains idea sounds a lot like Ethereum’s EL-CL decoupling. As you know I’m a big fan of modularity and I’m all for this idea, but it will require more long term thought and planning.

  • As usual, what appears to be missing here is prior art. How do other (PoS) chains approach these questions? How do they achieve consensus on the current/incoming validator set? Is it coupled to execution or not? How long do they wait for finality, and how do they achieve confidence on finality of the set? Etc. It would be helpful to have this information for context.

  • You didn’t directly address the problem of bootstrapping a broken Hare. If we make the active set dependent upon Hare (as in your idea #3), then consensus is dependent upon Hare, and Tortoise is dependent upon Hare, and if Hare breaks… how do we get Hare going again (i.e., bootstrap a new Hare active set)?