Our goals for PoST proofs are that 1. we require each smesher to perform a single read pass over all of their PoST data once per epoch, 2. that, other things equal, they be able to easily construct a concise, non-interactive proof of having done this that they can broadcast to the network to prove ongoing eligibility, and 3. that the adversary not be able to “cheat” by being able to generate such a proof without actually having stored (or regenerated) all of the PoST data they claim to have, i.e., that they not be able to successfully generate a valid proof for a larger amount of storage than the data they actually store.
The basic proof generation algorithm is as follows (see also: Summary of suggested protocol for PoST proof / verification):
Parameters:
- hash_threshold: hash output must be lower than this value in order to be valid
- num_required_labels: minimum number of labels required to pass the hash_threshold for a given nonce to generate a valid proof
- Iterate over sequential nonce values [0…]
- Per nonce, perform a single pass over all PoST data (i.e., each label in the tree). Hash each label along with [PoET challenge, nonce] and check if the output is below a certain threshold.
- After finding num_required_labels labels, stop iterating and construct a proof. The proof contains the [PoET challenge, nonce] and the set of matching labels.
Note that, in practice, a smesher can and likely will check many nonces in parallel. In general this algorithm is highly parallelizable and different portions of the tree can also be checked in parallel.
The parameters are chosen such that a home smesher running on modest hardware over a small number of storage units needs to check on the order of a few hundred nonces in order to find num_required_labels matching labels. This process (which is of course probabilistic) should take no longer than the PoET cycle gap, i.e., a few hours, in 99% of cases—in other words, except in rare cases where they get unlucky, a home smesher on modest hardware should have plenty of time to run this algorithm, successfully generate a valid PoST proof, publish it to the network, and submit to the next PoET round before it begins. For detailed math on choosing these params, see Analysis for PoST Parameters.
One problem with this design is that an adversary with a lot of compute power can cheaply check many more nonces in parallel than a home smesher. If they check enough nonces they will eventually find one that allows successful generation of a valid proof of having stored N space units even though the adversary has stored a substantially smaller fraction of the data, say, 50-80%. In this way it represents a “back door” way for such an adversary to partially substitute compute for storage (“back door” because it doesn’t involve regeneration of the data itself).
Our proposed solution to this problem is to make nonces expensive. In other words, a small amount of work must be performed for each nonce that’s tried and proofs of work to this effect must be included in the PoST proof. The upside here is that, assuming the amount of required work per nonce is low enough, the honest smesher (who indeed has all of the PoST data they claim to have) only needs to do the work for a few hundred nonces, whereas the dishonest smesher (who has less data than they claim) needs to perform orders of magnitude more work for an arbitrarily large number of nonces (parameterizable). The downside is that it requires all smeshers to do work, on top of the already resource-intensive process of generating PoST proofs and processing many ATXs each epoch that require validating these proofs.
Our proposed solution to this problem is to have the PoET server perform a modest amount of work, upfront and amortized among many smeshers, and to provide a PoW to this effect along with the PoET proof. The PoET server runs on powerful hardware and can do this work far more efficiently than most home smeshers. What’s more, the work can be done after the generation of the PoET proof is completed, during the cycle gap, when the server is otherwise idle.
The PoET server will perform PoW to validate a number of nonces that would allow a home smesher with up to ~4tb of storage space to successfully generate a PoST proof in 99% of cases. In other words, many smeshers can utilize the same PoW, which is in no way bound to a specific smesher ID. We find this number reasonable due to back of the envelope math: we expect the existing PoET servers, even without changes to hardware, to be able to perform this work in under an hour; and, based on HDD read speeds, 4tb is about the maximum amount of space that a smesher can read from a single disk during a cycle gap, i.e., it represents the effective maximum size of a single smesher’s data commitment given the current set of parameters (and without specialized hardware).
Note that the adversary can also grind on the PoET challenge (part of the input to the hash function used to find successful labels in the process of generating the PoST proof), since our PoET construction is currently very grindable. Changing a single label near the end of the proof of sequential work changes the PoET challenge without affecting the validity of the PoET proof itself. We considered designs to make the PoET challenge ungrindable, such as needing to “seal” the PoET proof itself using PoW, but decided that it’s unnecessary on top of the other ideas presented here.
This design has the advantage that the difficulty and thus the cost of the required work scales linearly with the total committed storage space; if this were not the case, it would still be more economical for the adversary not to store all of the data.
Note that “industrial” smeshers with >> 4tb of committed storage will still need to perform PoW to validate additional nonces in order to generate valid PoST proofs. Home smeshers will not need to do such work 99% of the time, but 1/100 times they will get unlucky, the set of valid nonces proven by the PoET won’t be sufficient to generate a valid proof, and they’ll need to perform additional PoW. This should take a home smesher with a small number of storage units (say, up to 1tb) running on a modestly powerful system no more than 10-20 minutes.
If we’re unable to add this functionality to PoET prior to genesis then all home smeshers will need to perform PoW once per epoch, at least until this functionality has been added to PoET.
It’s also true that this design incentivizes smeshers with > 4tb of storage to split their identities so that each falls below this threshold, meaning that most of the time they don’t need to do PoW on their own. However, we feel that the 4tb size, while large enough for basically all home smeshers, is small enough that it would be impractical for industrial smeshers to split identities with a very large amount of storage space (on the order of petabytes) into, say, hundreds of units and manage each separately. (They may still want to split large identities for other reasons, such as compartmentalizing disk failures).
In general we feel that this design does not change the existing trust model of the protocol that already exists with PoET. Smeshers trust the PoET server to perform work on their behalf and to honestly provide proofs of that work; they can (and do) submit to many PoET servers in order to minimize the exposure to a PoET server that acts Byzantine, and they are also free to run their own PoET server or do the work themselves if they choose.
With respect to the specific hash function chosen for this PoW, the prime candidate is RandomX from the Monero ecosystem but we haven’t made a final decision yet.