Sync/serve protocols

iddo · July 2, 2023, 11:08am

Here’s the sync/serve protocols under the assumption that we want all honest nodes to have all the syntactically-valid ballots (except when there’s a malfeasance proof that entails discarding of dishonest ballots) and all the unified content blocks (UCBs).

Note that sync’ing should be different in case we decide to invalidate this assumption (in favor of aggressive pruning), but that might be significantly less conservative.

Serve protocol:

Purpose: newcomer node fetching all the data of layers that she never reached yet.
The newcomer node requests “serve(start_layer=x, end_layer=y)” from her neighbor(s), and the honest neighbor sends all the syntactically-valid ballots and contextually-valid UCBs of layers x…y as data stream.
Note that the honest neighbor will have rules regarding how to throttle these “serve” requests, specifically:
** not allowing more than N serve requests simultaneously
** not allowing repeated serve requests from the same neighbor (determined according to the neighbor’s IP address)
** using less bandwidth and lower priority for “serve” requests, relative to other communication.
The default honest newcomer may request multiple neighbors to serve her the data concurrently, for example serve(start=1, end=1000) request sent to neighbor Alice and serve(start=1001, end=2000) request sent to neighbor Bob and serve(start=2001, end=3000) request sent to neighbor Carol, etc.

Sync reconciliation protocol for the contextually-valid mesh:

Note: The UCB of layer J has the list of rewarded ballots (i.e., the ballots of layer J that voted on layer J-1 and the earlier layers) with multiplicities.
** For example the list can be “Alice:2, Bob:1, Carol:4” meaning that Alice had 2 rewarded ballots, Bob had 1 rewarded ballot, and Carol had 4 rewarded ballots.
** The honest smesher needs to verify that she indeed has all of these rewarded ballots (in the above example the ballot of multiplicity=2 of Alice and the ballot of multiplicity=1 of Bob and the ballot of multiplicity=4 of Carol), otherwise she requests from neighbors (or else via gossip) the missing rewarded ballots, and as long as these rewarded ballots are unavailable she considers the UCB to be invalid and votes against it (voting still in favor is a less conservative approach).
We have at most one contextually valid UCB per layer.
Every honest node maintains the cumulative hash-chain of the contextually valid UCBs.
The two neighbors perform binary search to find the first layer J that they disagree on.
** When the disagreed-upon layer is discovered, the neighbor Alice send her UCB to neighbor Bob, then Bob computes the symmetric difference (according to his own layer) and sends back to Alice his UCB along with the rewarded ballots that Alice is missing, and then Alice sends him the rewarded ballots that he’s missing (according to the listed rewards in his UCB).
** The neighbors Alice and Bob continue consecutively with layer J+1, then layer J+2, then layer J+3, and so on, performing the same symmetric difference protocol. If the symmetric differences starting from layer J are large (in other words, the intersection is almost empty) then both of them switch to “serve” mode.

Sync reconciliation protocol for the contextually-invalid mesh:

For non-rewarded ballots, each honest node sorts them according to local timestamps (so the ballot that was received most recently will be the last in the sorting, even if it’s a ballot of an old layer), but also maintains their hash-chain according to lexicographical ordering.
In the first sync round the neighbor Alice will send to the neighbor Bob her latest batch (according to ber local timestamp ordering) of non-rewarded blocks (the default batch size is a protocol parameter, it can be e.g. 100). Then Bob computes the symmetric difference (with his own latest batch) and sends the result back to Alice.
If Alice and Bob still don’t agree on the lexicographical hash-chain value (in particular if the entire batch was already known to the other neighbor) then Alice will send her previous batch, and so on (for example, the last batch contains non-rewarded ballots 1 to 100 that were receiving during the first three months after genesis, the one-before-last batch contains non-rewarded ballots 101 to 200 were received during the next two months after that, etc.)
If after several rounds (i.e., batches) Bob never sent any non-rewarded ballots that Alice didn’t already have, then Alice will switch to “serve” mode, and in the end ask Bob to send the ballots that she is missing (the same is also true vice versa, i.e., Bob will switch to serve mode if Alice didn’t send ballots that he didn’t already have).

Sync’ing of contextually-invalid UCBs:

The default behavior of the honest node is not to sync her contextually-invalid UCBs at all, but also not to prune each contextually-invalid UCB for as long as she has honest ballots that voted for it (if all the ballots that voted for the UCB were discarded and replaced with malfeasance proofs, then the contextually invalid UCB should be discard too).
Note that the honest node Alice may store ballots that vote for a hash of UCB for which Alice doesn’t have the preimage.

countvonzero · July 16, 2023, 5:05pm

don’t understand the meaning of this.

our data validation in current implementation is recursive. doing these in parallel will most likely cause memory blowup.

currently each party will just have 1 rewarded ballot in any given block. the single ballot will record the eligibility count for that layer.

but does node switch back to serve node just because one neighbor is behaving dishonestly? why not drop this peer instead?

should this be part of the sync protocol? to me this is more like pruning protocol.
the only reason a node would have contextually invalid block is if it had generated it (according to hare output) locally and later found that the network voted for a different block.

iddo · July 17, 2023, 2:30am

These sync/serve protocols are under the simple/natural assumption that we want honest nodes to have the same data. If we are pruning “contextually invalid” data (such as non-rewarded ballots) then obviously we don’t need to sync such data. Such pruning is far from trivial and quite possibly it’s a bad idea, and if we assess these sync/serve protocols to be pleasant enough then the need for pruning reduces.

There can be sophisticated variants (e.g. header first sync followed by backward sync), but simplest in the above example is that if there are say 1 millions layers in total and the newcomer asks Alice for serve(start=1, end=1000) and serve(start=3001, end=4000) and erve(start=6001, end=7000) and so on, and asks Bob for serve(start=1001, end=2000) serve(start=4001, end=5000) and serve(start=7001, end=8000) and so on, and asks Carol for serve(start=2001, end=3000) and serve(start=5001, end=6000) and serve(start=8001, end=9000) and so on, and the newcomer won’t validate the layer data immediately upon receiving it, but rather merge the streams from Alice,Bob,Carol continuously as they arrive, and validate sequentially from the start until the latest consecutive layer (then wait until more consecutive layers arrive before continuing to validate).

I think that we’re saying the same thing here.
The next point about “needs to verify that she indeed has all of these rewarded ballots” is the important part, are we doing this?

For example, it might be that Alice has a lot of data but Bob was isolated and he has little data, even though both of them are honest.

Anyway, the important thing is that we will have an implementation of the “serve” protocol, and an implementation of the “sync” protocol, and then we can specify the exact rules that decide whether the neighbor is considered dishonest (and should be dropped/blacklisted) or still considered to be honest (and then we switch from “sync” to “serve”).

If say the adversary creates a ballot that votes for an unknown UCB that the adversary created, then the honest miner needs to receive this adversarial UCB before considering this adversarial ballot to be data that can potentially affect the contextually valid mesh. So if this adversarial UCB is sent then by default it should be received and kept, because the ballot might not be adversarial but rather an honest ballot that’s sent after netsplit. To avoid DoS, as mentioned we should discard UCBs if all ballots that vote for them were discarded and replaced with malfeasance proofs.

countvonzero · July 17, 2023, 12:52pm

currently UCB has a list of transactions and rewards. only ATX id and weight are in the rewards, and we don’t verify that the rewarded ballots actually exist locally.

even tho the only time we would sync a UCB from neighbor is when it becomes contextually valid, we can still potentially miss rewarded ballots in this case.

if we failed to sync any transactions in the UCB, then we will not store this UCB locally (and therefore would not vote for it).

iddo · July 17, 2023, 2:54pm

The same should be done regarding the rewarded ballots of the UCB (the weight of the reward should be the same value as the multiplicity of the ballot).

So if Alice has reward with weight 2 and Bob has reward with weight 5 then the honest node needs to have the ballot of Alice that’s voting for the previous layer with multiplicity 2 and the ballot of Bob for the previous layer with multiplicity 5, otherwise the honest node shouldn’t vote for this UCB.

And again, this is under the assumption that we want all honest node to sync all the relevant data so that they will be consistent with each other (if we opt for the aggresive approach then we might want to vote in favor of UCB whose rewarded ballots are missing, but we might want to do it only in some scenarios, or likely never, so we should implementing the fully correct behavior now).

iddo · January 19, 2024, 10:29pm

There’s a proposal to use a set reconciliation procedure for sync:
https://docs.google.com/document/d/1gmR0tC3d5CuUaaqneUiC8lkMordwVNOahHmiTFO2WKk/edit?pli=1
This proposal is problematic because:

It’s naive and therefore defeats the entire purpose of the sync protocol, which is to sync in the presence of dishonest nodes that may obstruct the gossip network and bring it to a standstill.
- With the mesh, honest full nodes typically only need to request “serve” mode for the last layers that they missed (if they went offline).
  - Besides this, the main thing that they need to be able to handle is DoS by dishonest nodes that try to saturate the communication among the direct peers (if the honest nodes wastes much of her communication on talking to her direct neighbors in the gossip network who prolong the sync protoocol and maximize the useless data that’s being sent, and furthermore send each next message just before the maximal timeout).
- In case of a genuine netsplit, indeed the sync protocol will need to do more than that, but as described in the OP (regarding “If after several rounds (i.e., batches) Bob never sent any non-rewarded ballots that Alice didn’t already have”) it’s better to detect and switch to serve mode.
- A major point regarding the sync for mesh is that the adversary can always come up with non-rewarded ballots (a.k.a. votes that can be disregarded for the verifying tortoise) out of thin air, and therefore those need to be handled separately according to timestamps.
- Such a set reconciliation procedure should definitely not be used before analyzing the damage that it may entails when some nodes are dishonest (the linked paper doesn’t analyze this).
The set reconciliation procedure is an overkill in terms of algorithmic complexity even if all the nodes in the network are honest.
- We shouldn’t have a cavalier attitude wrt implementing complex procedures that will later require us tp write code that simulates various kinds of attacks of the full node client and see that it isn’t attempting to run code that’s too demanding and causing it to crash.
  - The mesh is already significantly complex relative to a blockchain, and less demanding networks such as Ethereum blockchain have suffered attacks that crashed some full node client implementations.
- In a complex system such as Spacemesh it’s better to start with simple procedures and consider upgrading to more complex procedures if it turns out that it’s actually needed (for the sync protocol the complex set reconciliation is counterproductive due to the presence of dishoenst nodes, i.e., the simpler protocol is superior).

BTW there are protocol rules that we need to deploy prior to improving the sync, specifically wrt invalding the UCB if any ballots that correspond to its list of rewarded identities are missing.

iddo · January 20, 2024, 9:13pm

To put it very shortly, the new sync protocol is an optimization over existing fetch+sync for the cases when the symmetric difference between sets can be considered to be small, intending to significantly reduce both algorithmic complexity and communication complexity in such cases. It’s also initially intended for ATXs which are most painful to sync currently, to be extended to other objects as necessary.
Typical use cases are: a node goes offline for some hours and then wants to resync without causing too much trouble to its peers, such as having to transmit all of ATX IDs they have for the current epoch.
It can also help with detecting and resuming interrupted initial sync (getting different subranges from different peers).
In other cases, the sync algorithm can easily “degenerate” into something similar to existing sync, but the code will be using streaming approach w/o having to transmit big chunks of data at once. It can also be easily terminated in case it’s taking too long to complete, and existing validation procedures are applied to the data received before propagating them any further.

Replying to the above points:

The existing fetch+sync is uninteresting, we should implement the OP of this thread. If there’s a supposedly convincing reason to do something different, it should be analyzed carefully in a setting with malicious nodes, and only later we can consider implementing it.

As already discussed with @dmitry, the “sync reconciliation protocol for the contextually-invalid mesh” of the OP should be used for ATXs (and malfeasence proofs) too.

Again, we must not implement a naive protocol that assumes that the direct peers are honest (the malicious node can easily inject an old ATX to mess up the sync).

What you need to realize is that there are many other parts of the Spacemesh protocol where we go into great lengths to implement secure mechanisms that are resistant to attacks, and it’d be silly if all of other efforts go to waste just because we implemented a naive sync protocol that allows an adversary to saturate the network and bring it to a standstill.

BTW regarding the “serve” protocol (for a newcomer miner) we also discussed with @noam an alternative in which the ATXs are transmitted backwards (from the newest to the oldest), but adding support for this mode is non-trivial.

As decribed in the OP: “The default honest newcomer may request multiple neighbors to serve her the data concurrently, for example serve(start=1, end=1000) request sent to neighbor Alice and serve(start=1001, end=2000) request sent to neighbor Bob and serve(start=2001, end=3000) request sent to neighbor Carol, etc.”

This should be done in the same manner as the above with “serve(start=2001, end=3000) request sent to neighbor Carol, etc.”

About “having to transmit big chunks of data at once”, there should be a protocol parameter for the sender node (and sane defaults for the receiver node) regarding the maximum that’s allowed in a single request. There is an item list in the OP about the server node w.r.t. ‘rules regarding how to throttle these “serve” requests’.

Streaming can indeed be useful for an honest newcomer who’s being served by an honest node, but (as mentioned) the honest server node needs to throttle the serve requests that she’d be willing to agree to.

Anyway, these considerations have nothing to do with a complex set reconciliation procedure.

ivan4th · January 20, 2024, 9:47pm

One of the issues with syncing ATXs is that presently, ATXs don’t include Layer ID. As far as I understand, adding Layer ID info to the ATX entails a number of difficulties, including additional mechanisms for making sure it’s correct and the node didn’t put some random value instead, and also implementation problems with extending existing ATX data structure. That prompted the need of syncing ATXs on by-epoch basis. If this assumption is wrong, it’s probably a subject to discuss before further diving into the sync difficulties.

There’s of course a possibility to split the range of ATXs in other ways, e.g. passing a number of high bits of the ATX hash to each serve(…), e.g. send one of the neighbors a request along the lines of serve(start=0x00, end=0x10), the next one serve(start=0x10, end=0x20), etc. But this approach may only help with the Serve protocol part.

Additional note: ATXs did include Layer IDs initially, but apparently Layer IDs were removed from ATXs for the reason of not being able to ensure the correctness of these IDs.

ivan4th · January 20, 2024, 11:51pm

As a side note, if Layer IDs are (re-)added to ATXs even without ensuring the validity of these Layer IDs, that could still benefit the range subdivision based sync by achieving better clustering of missing ATXs to sync and thus fewer subdivisions.

iddo · January 21, 2024, 1:12am

@ivan4th There’s no need for layerID field in an ATX for the purpose of the sync/serve protocols because the tick count is the proper measure anyway (and it’s indeed problematic to have such a LayerID field due to consistency failures).

Edit: I meant EpochID rather than LayerID (both can be possible, but unneeded at least regarding serve/sync).

Ragarding serve mode: the newcomer should request ATXs according to tick count.
- If the newcomer wishes to receive ATXs from multiple neighbors in parallel and we assume for example 10000 ticks per epoch, then it can be serve(start=0, end=1000\!\cdot\! 10000) request sent to neighbor Alice and serve(start=1000\!\cdot\! 10000\!+\!1, end=2000\!\cdot\! 10000) request sent to neighbor Bob and serve(start=2000\!\cdot\! 10000\!+\!1, end=3000\!\cdot\! 10000) request sent to neighbor Carol, etc.
  - If in the future there’d be faster PoETs then the newcomer can update the estimated tick-per-epoch when fetching the newer epochs (it’s not a big deal, so the initial implementation can be the simplest).
Regarding sync mode: as mentioned it should be done according to "sync reconciliation protocol for the contextually-invalid mesh”.
- Therefore neither LayerID nor tick count are relevant, but rather it’s done according to local timestamps as described in the OP.

ivan4th · January 21, 2024, 2:10am

What happens if in serve mode, some of the peers turn out to be dishonest, or victims to some kind of netsplit/sync failure or a bug, and send an incomplete set of ATXs? If the ranges are retrieved in parallel and not sequentially, the whole lexicographical hash-chain is undermined then and serve protocol needs to be done from scratch?
Also, say we’re syncing the latest epoch, what kind of cutoff should we impose on “old” ATXs (but for that same epoch) being received, that are inserted somewhere early in this epoch’s part of the hash-chain?
Someone can hold an ATX for a while and then send it out, prompting a cascading effect of peers switching to serve mode (prompting transfer of say 1M ATXs or even more), or am I also missing something here?

iddo · January 21, 2024, 4:36am

@ivan4th The general outline is:

The newcomer node Alice will request her neighbors to serve her all the ATXs and ballots and UCBs.
Then Alice will compute locally the cumulative hash of the contextually-valid mesh.
Then Alice will switch to sync mode and thereby make sure that her neighbors have the same cumulative hash when checking it against a recent layer.
- If the cumulative hash isn’t the same then Alice will sync using the binary search approach (“Sync reconciliation protocol for the contextually-valid mesh” in the OP), and perhaps Alice will need to drop/blaclist neighbors that behaved badly during the sync.
Afterwards, Alice may still receive old “injected” ATXs and ballots, but those will be sorted according to her local timestamps, and disregard for as long as the verifying-tortoise is still happy (in other words, for as long as the total weight of the “injected” ATXs is low enough relative to the weight of the ATXs that are counted by the verifying-torise).
- BTW it’s possible to count ballots from those old “injected” ATXs if they agree with the verifying-tortoise, but for simplicity we can skip that.

However, there is indeed a non-trivial tension between receiving the data and verifying the data.

Alice definitely doesn’t want to delay the verification until receiving all the data.
If Alice is served by only a single node Bob then everything is straightfoward, i.e., Alice can verify one-by-one each the batches that Bob sends (say, verify epoch 101 after receiving epoch 103), and if the verification fails then Alice knows that Bob misbehaved.
If Alice receives the data from multiple neighbors, then upon receiving (say) epoch 103 and verifying epoch 101, if the verification fails then Alice needs to stop the happy flow and ask different neighbors (such as a permuation of her current neighbors) for the data of epoch 101 (perhaps dropping bad neighbors during this unhappy flow).

Also, you are correct in your previous post regarding “passing a number of high bits of the ATX hash to each serve(…), e.g. send one of the neighbors a request along the lines of serve(start=0x00, end=0x10), the next one serve(start=0x10, end=0x20)”, because many ATX are likely to have the same number of ticks (because they use the same PoET server), and therefore the range should be primarily according to the tick count and secondarly according to lexicographical ordering of the hash of the ATXs.

This is exactly why the “Sync reconciliation protocol for the contextually-invalid mesh” in the OP doesn’t use at all the cumulative hash-chain, and instead it relies on ranges according to local timestamps. The cumulative hash is derived only from contextually-valid UCBs (and by inference also rewarded ballots), so injecting data at a later time won’t affect this cumulative hash.

BTW, for ATXs it’d probably be useful (depending on the data structure) to store together with the local (received) timestamp also a boolean tag that specifies whether this ATX is contextually valid (i.e., whether there’s a rewarded ballot that was created by this ATX).

ivan4th · January 22, 2024, 4:51am

From what I understood, we need to sync all the ATXs, including those not referred to by any ballots, e.g. so that all the nodes have the same idea of the active set, and it looks like that “Sync reconciliation protocol for the contextually-invalid mesh” is not applicable to it, b/c these ATXs aren’t included in the hash chain, and thus it’s not possible to determine how many batches need to be checked. So basically we’re left with just “Serve” protocol for them? And it looks like a substantial % of ATXs do indeed fall into this category

> select count(atxs.id) from atxs
  inner join ballots b on b.atx = atxs.id
  where epoch=12;
338517
> select count(atxs.id) from atxs
  left join ballots b on b.atx = atxs.id
  where epoch=12 and b.id is null;
148003

And BTW base tick count / tick height doesn’t appear to be much useful for partitioning the ATX set at the moment indeed, too:

> select count(id) as count, base_tick_height, tick_count
  from atxs where epoch=12 group by base_tick_height, tick_count
  order by count desc limit 100;
335884 |103449 |9393
58824  |103449 |9331
42484  |103449 |8941
18533  |103449 |9159
8278   |103449 |9314
5572   |103449 |4714
5159   |103449 |8675
3818   |103449 |8614
368    |103542 |9415
178    |103542 |8389
52     |103449 |6492
45     |103591 |8985
43     |103449 |8737
36     |103542 |8990
20     |94122  |8991
19     |94122  |8989
12     |103449 |8704
11     |103449 |8984
9      |94122  |8992
9      |103542 |8812
9      |103542 |8858
7      |94122  |9331
7      |94122  |9393
7      |103542 |8545
4      |103591 |7906
3      |94122  |8675
2      |94122  |8614
2      |94122  |9314
2      |103449 |9415
1      |94122  |8929
1      |102744 |9393
1      |103376 |9331
1      |103449 |8917
1      |103542 |676
1      |103542 |6804
1      |103542 |7331

Probably it would be useful for @dmitry to join this discussion, too, especially given that the whole idea of applying some kind of general set reconciliation approach (graphene, IBLTs, etc.) was not initially mine and I only tried to pick the most lightweight algorithm for it

iddo · January 22, 2024, 5:41am

I wasn’t clear when I wrote in my last reply regarding the contextually-invalid mesh that it “doesn’t use at all the cumulative hash-chain”, what I meant is that it doesn’t use the cumulative hash-chain value that’s in consensus (i.e., the cumulative hash value of the contextually valid mesh).

Specifically, you wrote now:

You should read “Sync reconciliation protocol for the contextually-invalid mesh” in the OP, where the two neighbors indeed sync according to a hash-chain value that’d be agreed upon by both of them if they have exactly the same set of ATXs (in the OP it’s about invalid ballots but here we consider ATXs instead), and they will have a different hash-chain value while their set of ATXs isn’t the same. There’s a batch size parameter (using only a default size is enough for now, we can extend it for the neighbors to negotiate the batch size if they wish), and in each round they send the next batch according to their local timestamps (and then compute the hash-chain that disregards the timestamp ordering and instead uses lexicographical ordering that will indeed be agreed upon). As mentioned in the OP, if one node has too little data then it can trigger switching to serve mode, and if one node’s behavior is unreasonable then it should be dropped/blacklisted.

Unless there’s some serious new analysis of this in a network with malicious nodes, the two issues are:

Is there anything bad about it? Yes, it allows an adversary to clog the communication of honest nodes more easily and more severely.
Is there anything good about it? No, because malicious nodes can inject data that will be propagated and cause inferior performance. If all the nodes in the network are honest (which is obviously an unreasonable assumption) then maybe the answer would be yes, but (as mentioned) even then it’d be better for the implementation to start with something simpler.

dmitry · January 23, 2024, 10:51am

there are 3 things that we need to sync:

unrewarded ballots
malfeasance proofs
atxs

additionally it would make things simpler if we could use same protocol for rewarded ballots as well. we are not referencing ballot in the UCB, but only atx and weight. it can be ofcourse inferred that ballot from that atx in the specific layer must exist, and rpc can ask for such ballot. it will require additional functionality to support that, and my preference would be to avoid that.

protocol that was proposed by Iddo seems to work for unrewarded ballots, as the hashchain suppose to be small, and it won’t be a problem to recompute it if ballot is inserted somewhere at the start of hashchain. but how does it work with atxs hashchain? there will be a merge so eventually atx hashchain will become shorter. but i think it would be better to make sync protocol viable for 1_000_000 atxs or so. i didn’t benchmark but it is likely that rehashing 1_000_000 32 byte arrays is not cheap.

would it be possible to use order-independent fingerprint instead of hashchain? for example, something as simple as xor chain.

iddo · January 23, 2024, 3:21pm

It’s not 100% “ofcourse”, there should indeed be only a single ballot per ATX identitiy in each layer (so if a dishonest miner broadcasted two signed ballots for the same layer then that’s a malfeasance proof), but if there’s temporary dishonest majority then there can be a mismatch between proportional reward in the UCB and the weight of the ballot, so in the future we might want to add some additional sanity check (such a sanity check is definitely not needed now).

The implementation of this protocol rule is needed (one major reason is that otherwise the sync can get messed up), but we can start by implementing the other parts of the sync protocol first.

The XOR (among 256-bit valid ATX identities) seems to me to be a good idea that avoid the re-computation of the hash-chain.

The full node can still keep checkpoints of this cumulative XOR, at least for the recent timeframe.
For example, our sync protocol can dictate a check every once every month for the last 6 months, and nothing beyond that.
So, if the Spacemesh has been running for say 5 years, and the two honest neighbors disagree on the commulative XOR then they try to sync the most recent month (by sending the ATX IDs list of that month) and then check if they still disagree regarding the comulative XOR.
- If they still disagree, then they try to sync the one-before-last month, and check again.
- If it turns out that they disagree on more than 6 months, then they switch to serve mode.
  - if it turns out that Alice has all the data that Bob is sending, then she will can still assume that Bob is honest but he was isolated, so she would stop the serve that Bob was sending to her.
It’s also possible not to have the a rule such as “for the last 6 months”, and instead perform a binary search according to the cumulative XOR, but it’s reasonable to assume that two honest nodes that disagree on the old ATXs (that were received until say one year ago) should switch to serve mode.

Amira · January 26, 2024, 11:58am

Sync2 protocol

Pairwise sync

Peer A wants to sync with peer B, they have sets of ATXs. Fingerprints are IDs sorted together, there can be more protection by hashing them (collusion resistant hash).
We have a range of ATXs, we XOR them together to get ID, we keep track of count of how many values are in this range
A sends fingerprint along with its count to peer B
B computes range on its side and compares it to what it had received if the fingerprint matches (sync is done)
If they don’t match, B divides the range. B now has two ranges.
B gets fingerprints of which range and count and sends back those fingerprints to A.
A takes those ranges and checks the fingerprint of each one and if fingerprint matches, we exclude this range because it’s also synchronized
Where different fingerprints are found, we subdivide them again. We compute fingerprints of those parts. And we send the messages back to B.
B has a difference but range is small enough, there can be configurable threshold of how many items are considered small enough, in this case instead of subdivision it will send the item itself.
Peer A accepts the item, recalculates fingerprint for resulting range and sends missing items back Sync is now done

How to expand this to a multi-peer setting?

The idea there is that we make a clone of the local set for each peer and sync these clones against peers in parallel. Then we do local sync of the synced sets against the local main set.
There are 2 modes of multi peer synchronization:

Serve mode: very similar to serve protocol previously described in this post with a bit of optimization
Normal sync: when the peer is mostly up to date and the user wants to get up to date very quickly. So we have the main set of ATXs and if we have persistent data structure, we can, very cheaply, make a number of copies. So we want to see we have a few peers and we want to sync against those peers.

Example: we have 4 peers that want to sync:

For each peer, we’ll make a copy of the main set and begin the sync against each peer.
We propagate it to the main set after data is being validated. This data will be used for next cycles and thus propagated to other peers, but only after validation. So we do not allow pulling data out of thin air.
We do full sync, each copy syncs against the whole range and there will be not much excess transfer because there are few ATXs to see thanks to this subdivision and the differences will be found quite quickly. Not much communication will happen.
If something seems suspicious, we can drop a peer against others.
If the symmetric difference is big, we do not want to transfer the same ATX set from multiple peers over and over again. So instead, we subdivide the range depending on the number of peers and each copy is synced using bound and range. And then synchronization will only happen within those bounds and not outside it.
so when this sync is done, we can switch to normal sync.

Binary tree

Let’s say we have all our ATXs stored in the binary tree. There are subtrees cut off at each point and at the top is the actual value. In addition to the value itself, we also add the fingerprint and count. We compute fingerprints when we build the tree, it’s easily updated as data is inserted and removed from the tree.
We can check the path across the tree that connects the 2 nodes and exclude from it any nodes outside the ranch and then when we go across those paths, we can skip traversing the whole subtrees by just taking in their fingerprints and sort them together. We efficiently compute the fingerprint in logarithmic time. We can use different data structures such as treaps or BTrees which should be more efficient from a CPU perspective.

iddo · February 6, 2024, 3:02pm

This generic protocol is counterproductive for ATXs, because dishonest nodes can inject ATXs by pretending that they were received in whichever section of their timestamp ordered set. Even the paper that you’re basing it on says that the entire thing may not be useful depending on conditions.

As already written:

In other words, I’m saying that the generic protocol is counterproductive because I can explicitly describe to you the behavior of a dishonest neighbor who will keep the honest node performing more and more rounds of this generic protocol for no good reason.

This translates into an attack that clogs the network when many such dishonest nodes will connect to many honest nodes, and cause the honest nodes to believe that they need to continue running this generic sync protocol that you describe.
The honest node should have rules regarding when to stop being clogged down, but this generic protocol is unconducive w.r.t. such rules (i.e., the situation is much better without this generic protocol).

Also the generic protocol requires implementing more logic that needs to be tested for exploits, which is another reason why it’s bad, and why there’s nothing that’s supposed to be good about it for sync’ing ATXs (this generic protocol is the best one can do when there’s no choice of doing something that’s more informed, whereas for Spacemesh this uninformed generic protocol is bad).

And again:

The idea behind what we need to implement is that a dishonest node can alway say “I don’t have data” (i.e., pretend to be a newcomer), and ask to honest neighbor to serve the data, but when the honest node knows that the requested mode is “serve mode” then the honest node can throttle the data that’s being served, as written in the OP:

We don’t have to implement all of these throttle rules in the first version of the sync protocol, but the protocol itself needs to operates in the best way when these rules are added later (when the value of Spacemesh becomes greater so that dishonest network nodes will try to attack by clogging the network).

Therefore, as described briefly in my last post, the protocol for sync’ing ATXs should be:

The honest node Alice keeps all the valid ATXs that she received in a list that’s ordered according to her local timestamps.
- That is, if Alice received atx_bob at 9:00am, atx_dave at 9:15am, and atx_carol at 9:49am, then the list is (atx_bob, atx_dave, atx_carol).
Alice keeps checkpoints of the fingerprints of her timestamps-ordered list:
- The fingerprint of a set of valid ATXs is simply the XOR among all the 256-bit hash identities of the ATX in the set.
- We have two parameter:
  - N: the number of checkpoints.
  - T: the timeframe for every latest checkpoint.
- Reasonable parameters are for example N=6 and T=month
  - Here sync protocol dictates tjhat Alice checkpoints all the ATXs with timestamps that span over one month, and consolidates (by XOR’ing the XORs) all months beyond half a year.
  - Explicitly: starting from genesis with golden ATXs on January 1st 2025 at 9:00am
    - When Alice collects ATXs with timestamps until February 1st 2025 at 9:00am, she will create c_1 which is the XOR of all the IDs of these valid ATXs
    - When Alice collects ATXs with timestamps from February 1st 2025 at 9:00am until March 1st 2025 at 9:00am she will create c_2 which is the XOR of all the IDs of valid ATXs between February and March.
      - Alice also maintains the overall fingerprint which is now c_all = c_1 XOR c_2
    - When Alice collects ATXs with timestamps from March 1st 2025 at 9:00am until April 1st 2025 at 9:00am she will create c_3
      - Alice also maintains the overall fingerprint which is now c_all = c_1 XOR c_2 XOR c_3
    - The same goes own May, June, July, …
    - After Alice collects ATXs with timestamps from July 1st 2025 at 9:00am until August 1st 2025 at 9:00am she will create c_7
      - Now Alice also consolidates c_1 and c_2 by XOR’ing them into c_old = c_1 XOR c_2
      - And now: c_all = c_old XOR c_3 XOR c_4 XOR c_5 XOR c_6 XOR c_7
    - After Alice collects ATXs with timestamps from August 1st 2025 at 9:00am until September1st 2025 at 9:00am she will create c_8
      - Now Alice also consolidates c_3 by XOR’ing it into c_old = c_old XOR c_3
      - And now: c_all = c_old XOR c_4 XOR c_5 XOR c_6 XOR c_7 XOR c_8
    - After Alice collects ATXs with timestamps from September 1st 2025 at 9:00am until October 1st 2025 at 9:00am she will create c_9
      - Now Alice also consolidates c_4 by XOR’ing it into c_old = c_old XOR c_4
      - And now: c_all = c_old XOR c_5 XOR c_6 XOR c_7 XOR c_8 XOR c_9
    - And so on…
Suppose again N=6 and T=month and for example that Spacemesh has been running for approximately 5 years
- Suppose that honest neighbors Alice and Bob disagree on c_all which is the commulative fingerprint of all the ATXs (except for recent ATXs that were received since the start of the month).
- Alice and Bob verify that they disagree on the fingerprint of the most recent month c_60, and if they indeed discover that their c_60 values differ then they sync the entire month (by sending the ATX IDs list of that month) and calculate c_60_new then peform c_all = c_old XOR … XOR c_59 XOR c_60_new
- If they still disagree, then they try to sync c_59 and check again.
- If they still disagree, then they try to sync c_58 and check again.
- If they still disagree, then they try to sync c_57 and check again.
- If they still disagree, then they try to sync c_56 and check again.
- If they still disagree, then they try to sync c_55 and check again.
- If it turns out that they disagree on more than 6 months, then they switch to serve mode.
- if in the previous rounds for c_60,c_59,c_58,… it turned out that Alice always had all the ATXs is the list of IDs that Bob was sending, then she can still assume that Bob is honest but he was isolated
  - Alice will request Bob to stop the sync flow that Bob has been transmitting to her.
  - Alice will switch to sending the ATXs to Bob in “serve mode”, which she will throttle appropriately.

This protocol is suitable for sync’ing ATXs, because it facilitates easy detection of when it’s better to serve instead of sync, and maintains an optimal amount of overhead in terms of computation/communication/storage. This isn’t true in general, but it is true under the assumption that when honest nodes are out-of-sync regarding old data (for example more than half a year old) then it’s pointless to do anything other than serve, which is a conservative assumption for Spacemesh.

ivan4th · February 8, 2024, 2:29am

This actually resembles a special case of the pairwise range sync algorithm I described (which also XORs together the ATX IDs for fingerprinting, extra hashing being optional) with several specific choices:

ATXs are keyed by the N of months before current one instead of ATX IDs (from what I understand, that’s what really matters during the sync, not the full timestamp)
on each step just a single subdivision is done on “this month and the rest” basis
the number of steps before “serve” fallback is limited by 6

But what I’m worried about here is the timestamp-based sync keying. Please correct me if I misunderstood something and the following is wrong.

One question is, during such pairwise sync, when Alice receives some ATXs from Bob, does she use her own or Bob’s timestamps for it, to find out which c_X fingerprints should have these ATX IDs? One would say that using Bob’s timestamps makes sense, b/c this way sync will be idempotent. Also the answer can be “both”, but in this case Bob must also update his fingerprints for the ATXs he sent to Alice, listing the ATX as potentially belonging to several c_X fingerprints. If Alice uses only her own timestamps for received ATXs, then she will not be in sync with Bob according to this algorithm after its run, and repeated sync with Bob will require more transfers.

Hence the ATX timestamps have the property of being self-propagating across the network, and that may indeed be a problem. Let’s call the following “ATX stashing attack”. Let’s say there’s an attacker who controls a number of nodes that produce ATXs but do not publish them immediately. Instead, they stash them and hold them say for 6 months. After that, they start publishing the stashed ATXs with some moderate cadence using 6-month old timestamps. Each node the attacker’s nodes sync with will have to resort to the serve mode, but that’s not the main problem. The main problem is that each node that receives such an old ATX gets “infected” and will trigger serve mode during sync with its peers, even while being completely honest. With the ATX stash being big enough, it’s basically possible to hold the network in “always serve” mode most of the time, reducing its efficiency. And throttling serve mode may even make things worse.

As I said, maybe I missed smth, like non-idempotent sync with local timestamps only being somehow feasible, pls explain if that’s the case.

iddo · February 13, 2024, 1:20pm

First, the way that I specified the protocol in my last post is a little bit too tedious/unnecessary:

It’s ok to argue theoretically that saving the fingerprints of the last (say) 6 months is best, but in practice it’s helpful only in far-fetched scenarios such as Alice and Bob disagreeing on the ATXs of June, then agreeing exactly on all the ATXs in (say) May and April and March, but then disagreeing on the ATXs of February.
Anyway, re-calculating the XOR of say 10000 32-byte IDs in negligible.
It’s also arguable whether we even want to re-calculate the XOR fingerprint and likely waste a round of interaction for each previous month.
- For example, instead of sending the XOR fingerprint for April (after Alice and Bpb already discovered that they disagree on May and June) to test whether they disagree, and then send the list of ATX IDs for April, it’s arguanly superior to skip the first round of interaction and start to handle April by just sending the list of ATX IDs.
The thing is that whenever the honest node receives an ATX that has never been seen before, it’s timestamp will be the latest, and it will be combined into the fingerprint of the latest group, so saving the last several months of fingerprints is also a bit confusing (even though theoretically it’s arguably good).

So it’s better to specify the protocol in a more clean/simple way (without saving multiple old fingerprints), as follows:

The honest node Alice keeps all the valid ATXs that she received in a list that’s ordered according to her local timestamps.
- That is, if Alice received atx_bob at 9:00am, atx_dave at 9:15am, and atx_carol at 9:49am, then the list is (atx_bob, atx_dave, atx_carol).
Alice will keeps the checkpoints c_curr, c_old, c_all which are fingerprints taken from her timestamps-ordered list:
- The fingerprint of a set of valid ATXs is simply the XOR among all the 256-bit hash identities of the ATX in the set.
- We have three parameters:
  - T: the timeframe for collecting ATXs before adding them to the checkpoint.
  - N_1: the maximal number of timeframes for sync disagreement.
  - N_2: the maximal number of timeframes in which “serve only” occured.
- Reasonable parameters are for example T=month and N_1=6 and N_2=3
  - Here sync protocol dictates tjhat Alice checkpoints all the ATXs with timestamps beyond the last month, and consolidates the last month (by XOR’ing the XORs) when a new month begins.
  - Explicitly: starting from genesis with golden ATXs on (for example) January 1st 2025 at 9:00am
    - When Alice collects ATXs with timestamps until February 1st 2025 at 9:00am, she will mix into c_curr the XOR of all the IDs of these valid ATXs (and assign c_all := empty)
    - When Alice collects ATXs with timestamps from February 1st 2025 at 9:00am until March 1st 2025 at 9:00am:
      - At the start of Februrary, Alice assigns c_old := c_curr and c_all := c_all XOR c_old and c_curr := empty
      - During February, Alice mixes into c_curr the IDs of valid ATXs between February and March.
    - When Alice collects ATXs with timestamps from March 1st 2025 at 9:00am until April 1st 2025 at 9:00am:
      - At the start of March, Alice assigns c_old := c_curr and c_all := c_all XOR c_old and c_curr := empty
      - During March, Alice mixes into c_curr the IDs of valid ATXs between March and April.
    - When Alice collects ATXs with timestamps from April 1st 2025 at 9:00am until May 1st 2025 at 9:00am:
      - At the start of April, Alice assigns c_old := c_curr and c_all := c_all XOR c_old and c_curr := empty
      - During April, Alice mixes into c_curr the IDs of valid ATXs between April and May.
    - And so on…
Suppose again T=month and N_1=6 and N_2=3 and for example that Spacemesh has been running for approximately 5 years.
- Suppose that honest neighbors Alice and Bob disagree on c_all which is the cumulative fingerprint of all the ATXs (except for recent ATXs that were received since the start of the month).
- Alice and Bob check whether they disagree on the fingerprint of the last month by sending c_old to each other.
  - If they indeed discover that their c_old values differ then they sync the entire last month (by sending the ATX IDs list of that month)
    - Alice computes c_old_diff := c_old_bob excluding c_old_alice (that is, the XOR of all the ATXs IDs that Bob had in his c_old and Alice didn’t have in her c_old).
    - Bob computes c_old_diff := c_old_alice excluding c_old_bob
  - Alice and Bob now check whether they agree on c_all XOR c_old_diff
    - If yes, then the sync protocol terminates.
    - If no, then Alice and Bob compute c_old_2 which is the XOR of all the ATX IDs from one month before.
      - For example, if c_old was the fingerprint of all ATXs from July to June, then c_old_2 would be the fingerprint of all ATXs from June to May.
      - Similarly, Alice and Bob then compute c_old_2_diff, and repeat.
- If it turns out that they disagree on more than N_1 timeframes (6 months with the recommended parameters), then they switch to serve mode.
- if in the last N_2 rounds (c_old, c_old_2, c_old_3 assuming that N_2=3) it turned out that Alice always had all the ATXs is the list of IDs that Bob was sending, then she can still assume that Bob is honest but he was isolated
  - Alice will request Bob to stop the sync flow that Bob has been transmitting to her.
  - Alice will switch to sending the ATXs to Bob in “serve mode”, which she will throttle appropriately.

Alice uses only her own timestamps (as written above “whenever the honest node receives an ATX that has never been seen before, it’s timestamp will be the latest, and it will be combined into the fingerprint of the latest group”). It would defeat the entire purpose to use anything other than Alice’s local timestamps, because the dishonest neighbor can always lie via “injected” ATXs and thereby prolong the sync (for the contextually-valid ballots we don’t have this problem, as described in the OP).

Indeed this needs to be specified with further details, let’s break it down:

Optimistic fast detection
- As the first step when starting their sync, Alice and Bob will send to each other the fingerprint of c_all_high_grade which was computed (assuming that c_old hasn’t been xor’d into c_all yet) as c_all_high_grade := c_all XOR c_old_high_grade where c_old_high_grade are only the ATXs with grade 2 and grade 1 that were collected into c_old
- See Grading ATXs for the Active Set regarding the grades (with hare3 this gets tweaked a little because there are more than 3 grade).
- If c_all_high_grade is the same, then Alice and Bob terminate (there’s no need to sync).
- An adversary can cause Alice and Bob to disagree on c_all_high_grade (even when Alice and Bob are both honest), by broadcasting a late ATX that receives grade 0 by some honest miners (hence this adversarial ATX might not be rewarded).
- It might be possible to improve this fast detection further, by committing values of of c_all_high_grade fingerprints into the mesh consensus, but let’s not worry about it for now.
- This entire “fast detection” step can be skipped altogether, but it’s better ro keep it because our design should be the most efficient during the happy flow.
If the 1st step failed, then Alice and Bob freeze their c_curr, and send the list of ATX IDs in the frozen c_curr to each other.
- Alice and Bob compute c_curr_union := c_curr_alice union_xor c_curr_bob (that is, the XOR of all the ATXs IDs in the union of the c_curr_alice and c_curr_bob)
- Alice and Bob compute c_all_new = c_all XOR c_curr_union and sends c_all_new to each other.
- If Alice and Bob agree on c_all_new then they terminate (there’s no need to sync).
If the 2nd step failed, Alice and Bob continue to sync the last N_1 months
- In each of the N_1 iterations
  - Alice and Bob wil XOR each previous month’s diff into c_all_new
  - Alice and Bob will send c_all_new to each other, and terminate if they agree on it.
  - If Alice received nothing new during N_2 iterations, she switches to serve mode (as specified above).

Notes:

If Bob is dishonest, then he can come up with a syntactically valid ATX that Alice doesn’t have during each of the N_2 iterations, namely by taking these valid ATXs from the recent ATXs that were broadcasted after c_curr was frozen, and pretending that he received those ATXs a long time ago.
- The full sync protocol should add a punishment that drops/blacklists Bob upon detecting this malicious behavior (the PoET ticks of the ATXs that Bob lie about are too high, and an honest ATX that was broadcasted after the frozen c_curr is likely to have a positioning ATX that’s more recent than what Bob claimed).
- Even without punishment, such ATXs that Bob sends will be considered as “already seen” by Alice (if she received these ATXs after she froze the c_curr fingerprint that she’s using for sync’ing with Bob), so this can go on for at most N_2 iterations.
The happy flow (specifically when everyone is honest) is one of the most important aspects, and here the happy is Alice and Bob sending to each other just a single fingerprint value (that they already calculated before), and then terminate immedaitely.
- We don’t want to use any sync protocol in which the happy flow is more burdensome than this.

ivan4th:

Hence the ATX timestamps have the property of being self-propagating across the network, and that may indeed be a problem. Let’s call the following “ATX stashing attack”. Let’s say there’s an attacker who controls a number of nodes that produce ATXs but do not publish them immediately. Instead, they stash them and hold them say for 6 months. After that, they start publishing the stashed ATXs with some moderate cadence using 6-month old timestamps. Each node the attacker’s nodes sync with will have to resort to the serve mode, but that’s not the main problem. The main problem is that each node that receives such an old ATX gets “infected” and will trigger serve mode during sync with its peers, even while being completely honest. With the ATX stash being big enough, it’s basically possible to hold the network in “always serve” mode most of the time, reducing its efficiency. And throttling serve mode may even make things worse.

Regarding this scenario:

Suppose that Alice and Bob are honest, and Carol is dishonest.
Carol sync with Alice, and will transmit to Alice an ATX that Carol supposedly received more than 6 months ago.
- Alice will trigger “serve” mode with Carol, but as mentioned this is unavoiding (Carol can always pretend to be a newcomer that has no data whatsoever and ask to be served).
Alice stores according to her own local timestampt the ATX that’s supposedly more than 6 months old that Carol sent, so this ATX is stored in the latest c_curr checkpoint of Alice.
When Alice and Bob sync later, the supposedly 6-months old ATX will be sync’d during the first iteration.
- Therefore it will not trigger “serve” mode between the honest Alice and the honest Bob.

As mentioned earlier:

iddo:

The idea behind what we need to implement is that a dishonest node can alway say “I don’t have data” (i.e., pretend to be a newcomer), and ask to honest neighbor to serve the data, but when the honest node knows that the requested mode is “serve mode” then the honest node can throttle the data that’s being served, as written in the OP:

iddo:

** not allowing more than N serve requests simultaneously
** not allowing repeated serve requests from the same neighbor (determined according to the neighbor’s IP address)
** using less bandwidth and lower priority for “serve” requests, relative to other communication.

We don’t have to implement all of these throttle rules in the first version of the sync protocol, but the protocol itself needs to operates in the best way when these rules are added later (when the value of Spacemesh becomes greater so that dishonest network nodes will try to attack by clogging the network).

Topic		Replies	Views
Verifiable shuffle Research	0	164	November 6, 2023
Deferred Block Fetching (a.k.a. Dangling Pointers) Research	17	700	June 26, 2023
Compact encoding for hare Research hare	15	101	July 17, 2024
Hare equivocations defense Research	11	620	June 9, 2023
Achieving consensus on epoch participants collection Research	1	312	January 11, 2023

Sync/serve protocols

Related topics