Here’s a summary of the optimized protocol.
Names and Definitions
We use the following list of ingredients.
Tools

A “fast” random oracle H that takes nbit inputs and produces mbit outputs. In practice:
 For
sha256
, n=256 (or anything we want really) and m=256  For
aes
, n=128 and m=128  For
siphash
, n=256 (or anything we want) and m=64. Note that both AES and Siphash are keyed functions; we put the challenge in the key instead of as a prefix to the input)
 For

A “slow” random oracle W that takes \log N bits and outputs 8 bits. This is the oracle used to generate the PoST Table; cell i in the table has the value W(i). (in practice W is our version of
scrypt
, with the appropriate ID prefix already included).
Below, we’ll the +
operator for concatenation. Concatenation is bitwise (e.g., if x
is a 4bit value and y
is a 3 bit value, then x+y
is the 7bit value consisting of the 4 bits of x followed by the 3 bits of y).
Inputs

Challenge
ch
. Think of the challenge as a random 64bit string (in practice, the adversary can affect this a bit, but we’ll ignore this for now)  PoST table. Each cell of the table is a label, which in our case is 8 bits. Think of the table size as ~N=2^{38} labels (that’s 256 GB of data).

Label block table
T
. We partition the PoST tables into blocks of B labels. (e.g.,T[0]
consists of the concatenated PoST table labels 0,\dotsc,B1,T[1]
is the concatenated PoST labels B,\dotsc,2B1, etc.).
Parameter Summary
 N the number of labels in the PoST table
 B: the label block size (how many labels in a labelblock)
 k: how many “good” labelblocks make up a proof.
 d: the “difficulty threshold” for a good labelblock.
 d=\log{N}  \log{B} the number of bits required to express the difficulty threshold.

max_nonce
: the maximum nonce index (an integer, think of it as <100). 
num_nonces
: The number of nonces an honest party computes in a single pass over the data.  n the input size of the fast oracle H
 m the output size of the fast oracle H.
Proof and Verification
A proof consists of:
 a challenge
ch
 a nonce
nonce
 An array
P
of k labelblock indices: (P[i]
is an index of size \log(N)\log(B) bits)
To verify the proof, the verifier does the following
for all i in 0..k:
# Reconstruct labelblock
start_index = B*P[i]
block = W(start_index)+W(start_index + 1) + ... + W(start_index+B1)
# Check that it passes threshold
# compute the index of the fastoracle query that will contain `nonce`.
nonce_block = floor( (nonce * d) / m ) # This is a log(max_nonce)bit value
nonce_offset = (nonce * d) mod m
msb_bits = min(d, m  nonce_offset)
out_msbs = H(ch + block + nonce_block) # this is an mbit value
if msb_bits < d
lsb_nonce_block = nonce_block+1 # this is a log(max_nonce)bit value
out_lsbs = H(ch + block + lsb_nonce_block) # this is an mbit value
else:
out_lsbs = 0
# Possible offbyone errors below: caveat emptor
out = out_msbs[nonce_offset : nonce_offset + msb_bits  1] + out_lsbs[0 : d  msb_bits]
# out is a dbit value (the + above is concatenation).
if out > d:
reject proof
The intuition here is that for each block, we think of the output of H as a “stream” of bits, where the first m bits are given by H(ch+block+0)
, the next by H(ch+block+1)
, etc. Each d bits of this stream is the value for a single nonce.
For example, if N/B = 2^{32} and m=256, d=32, and each hash invocation produces exactly 8 nonces. So nonce 0 is the first four bytes of H(ch+block+0)
, nonce 1 is the next four bytes of H(ch+block+0)
and nonce 9 is the first four bytes of H(ch+block+1)
.
The code above also handles the case where d doesn’t evenly divide m; for example, if d=40, then nonce 0 is the first 5 bytes of H(ch+block+0)
, but for nonce 6, the last two bytes of H(ch+block+0)
are its MSBs, while the first 3 bytes of H(ch+block+1)
are the LSBs.
Verification cost
For full verification, at worst each verifier invokes W k\cdot B times and H 2\cdot k times.
Prover
# Number of calls we have to make to H to get all nonces for single pass.
num_outs = ceil(num_nonces * d / m)
# Initialize lists of good indices
good[0] = [], ... , good[num_outs  1] = []
# Make a pass over entire labelblock table.
for i in 0...N/B:
out = H(ch + T[i] + 0) + ... + H(ch + T[i] + num_outs  1) # + is bitconcatenation.
for j in 0...num_outs  1:
if out[j*d:j*d+d1] < d:
add i to good[j] # found a good labelblock for nonce j
if length(good[j] == k):
return (j, good[j])
# we didn't find anything in the first pass; we do it again with a second pass, starting at nonce num_nonces.
Prover cost
Assuming a single pass (which is enough with pretty high probability):
For each labelblock, the prover invokes H \lceil \texttt{num\_nonces}\cdot d/m \rceil times. Thus, the total cost is \lceil \frac{\texttt{num\_nonces}\cdot d}{m} \rceil \cdot \frac{N}{B}
Parameter examples.
We’re currently thinking of N=2^{38} (for one space unit), k=800, B=16 and num_nonces=20
. This means the total number of blocks is 2^{34}, and d=34.

If we’re using AES, then m=128, so we need 6 invocations per block. This means for a single proof we’d need 6\cdot 2^{34}\approx 2^{36.6} AES invocations. For verification, we’d need 800\cdot 16=12800 scrypt invocations and 1600 AES invocations.

If we’re using Blake3512, then m=512, so we need 2 invocations per block. This means for a single proof we’d need 2\cdot 2^{34}\approx 2^{35} Blake3512 invocations. For verification, we’d need 800\cdot 16=12800 scrypt invocations and 1600 Blake3512 invocations.
Note: It might be possible to directly get the 680 bits we need for 20 nonces from Blake’s variableoutput length capability, with better efficiency than calling the function twice.

If we’re using Siphash, then m=64, so we need 11 invocations per block. This means for a single proof we’d need 11\cdot 2^{34}\approx 2^{37.5} Siphash invocations. For verification, we’d need 800\cdot 16=12800 scrypt invocations and 1600 Siphash invocations.
The breakeven point is that Siphash must be more then 5.5 times faster than blake3512 to be better over all, and AES must be more than 3 times faster.