Motivation
PoET servers are a potential single-point-of-failure. There are many scenarios where PoETs can be attacked and possibly fail. In addition, a single widely used PoET service provides its operators centralized control of the network. If a widely used PoET service is malicious or attacked it can have detrimental effects on the network.
Mitigation
Allow smeshers to register for multiple PoET services and pick one proof to use in their ATX.
Assumption
This is only relevant if there are actually multiple entities that run PoET services. If in reality only the Spacemesh company will run a PoET service then it gets the power to censor smeshers and this mitigation won’t be effective.
Having said that, allowing smeshers to register to multiple PoETs will encourage running additional PoETs. E.g. users will be able to run their own PoET as a backup even if it’s slower than what Spacemesh provides and it will only be used as a fallback.
Implementation
Configuration
The current config includes:
PoETServer string `mapstructure:"poet-server"`
Instead of a single string, it should include a list of strings.
Submitting and Querying
We currently instantiate an HTTPPoetClient
in node.go->App.Start()
based on the configured PoET address. We then pass it along to the NIPostBuilder
in nipost.go
and call Submit()
from BuildNIPost()
.
We don’t currently query the PoET server, since the existing implementation awaits a proof via gossip.
I think the best way to update this flow is to create a list of HTTPPoetClient
s and iterate over it every time we need to submit a challenge, calling the same method with the same arguments on each one.
It’s also possible to create a smarter client that internally holds a list of servers and does the iterating on its own. I feel that this idea breaks down when querying, since it means either returning a list of responses, or putting the logic for picking a PoET proof inside the client - both options sound bad to me.
Selecting a PoET Proof
When the target time arrives (see below), in nipost.go->BuildNIPost()
, under // Phase 1: receive proofs from PoET service
, all PoETs should be queried. Then the node must select a single proof to include in its ATX. The valid PoET proof where the smesher is a member, with the highest tick count should be selected.
If no valid PoET proof where the smesher is a member can be found, and some PoETs didn’t provide a proof at all (e.g. they are late) the node will keep querying at regular intervals until the timeout arrives.
In order to reduce the risk of missing another epoch due to a late or failed PoET, the smesher should give up and submit a challenge for the next epoch towards the end of the PoET registration time, which should also be configured.
Determining the target time
There are generally 3 relevant deadlines to publish an ATX:
- Just before the next PoET round begins.
- Just before the target epoch begins.
- Just before the last round of the target epoch begins.
Ideally, deadline 1 should be used, so that there’s no slippage. If a smesher published their previous ATX in layer 1000 of the epoch, they should keep doing so. Otherwise smeshers will all slowly (or quickly) slip closer and closer to the second deadline. In the future we may implement incentives or rules that should prevent this from happening.
Deadline 2 is the last opportunity to publish the ATX before the target epoch and not risk missing an eligibility (e.g. in case the smesher is eligible for a ballot in the first layer of the epoch). If we wait for that deadline, unless there’s a PoET that starts right at that time (which we’re not going to do) then we miss the next epoch, so we probably shouldn’t.
The 3rd deadline is the last opportunity to get any eligibility out of the ATX. This should only be used as a timeout. E.g. if the node is offline and when it comes back online it sees that it has an unpublished ATX and this deadline hasn’t passed yet - then it may be worth publishing, but if the time passed - the node can just discard it.
So in order to publish the ATX before the next PoET round begins, the node needs to know:
- When the next PoET round begins (for genesis we can hardcode a layer relative to the epoch and just assume that all PoETs start at that time - we’re not doing phased PoETs).
- How long it takes the node to query the PoET, run a PoST proof, construct an ATX and submit a challenge to the PoET for the following epoch. Let’s call this length of time “the PoET delta”. See below for how to determine it.
Determining the PoET delta
The delta depends partly on the hardware used, but mostly on the number of space units, since reading all this data takes time.
We should define some basic time it takes per space unit. We can determine this by benchmarking on the lower end of the supported hardware spectrum. The node will multiply this time by the number of space units allocated.
When we calculate the PoST proof, we should measure the time it takes and next time use this time plus some buffer (e.g. 10%) as the delta.
At genesis, the PoET will use some big delta that should allow home miners enough time, but in the future we may do different things, like post a proof early and then update it with more ticks as those accumulate or have different PoETs with different deltas.