Sync/serve protocols

iddo · February 13, 2024, 1:20pm

First, the way that I specified the protocol in my last post is a little bit too tedious/unnecessary:

It’s ok to argue theoretically that saving the fingerprints of the last (say) 6 months is best, but in practice it’s helpful only in far-fetched scenarios such as Alice and Bob disagreeing on the ATXs of June, then agreeing exactly on all the ATXs in (say) May and April and March, but then disagreeing on the ATXs of February.
Anyway, re-calculating the XOR of say 10000 32-byte IDs in negligible.
It’s also arguable whether we even want to re-calculate the XOR fingerprint and likely waste a round of interaction for each previous month.
- For example, instead of sending the XOR fingerprint for April (after Alice and Bpb already discovered that they disagree on May and June) to test whether they disagree, and then send the list of ATX IDs for April, it’s arguanly superior to skip the first round of interaction and start to handle April by just sending the list of ATX IDs.
The thing is that whenever the honest node receives an ATX that has never been seen before, it’s timestamp will be the latest, and it will be combined into the fingerprint of the latest group, so saving the last several months of fingerprints is also a bit confusing (even though theoretically it’s arguably good).

So it’s better to specify the protocol in a more clean/simple way (without saving multiple old fingerprints), as follows:

The honest node Alice keeps all the valid ATXs that she received in a list that’s ordered according to her local timestamps.
- That is, if Alice received atx_bob at 9:00am, atx_dave at 9:15am, and atx_carol at 9:49am, then the list is (atx_bob, atx_dave, atx_carol).
Alice will keeps the checkpoints c_curr, c_old, c_all which are fingerprints taken from her timestamps-ordered list:
- The fingerprint of a set of valid ATXs is simply the XOR among all the 256-bit hash identities of the ATX in the set.
- We have three parameters:
  - T: the timeframe for collecting ATXs before adding them to the checkpoint.
  - N_1: the maximal number of timeframes for sync disagreement.
  - N_2: the maximal number of timeframes in which “serve only” occured.
- Reasonable parameters are for example T=month and N_1=6 and N_2=3
  - Here sync protocol dictates tjhat Alice checkpoints all the ATXs with timestamps beyond the last month, and consolidates the last month (by XOR’ing the XORs) when a new month begins.
  - Explicitly: starting from genesis with golden ATXs on (for example) January 1st 2025 at 9:00am
    - When Alice collects ATXs with timestamps until February 1st 2025 at 9:00am, she will mix into c_curr the XOR of all the IDs of these valid ATXs (and assign c_all := empty)
    - When Alice collects ATXs with timestamps from February 1st 2025 at 9:00am until March 1st 2025 at 9:00am:
      - At the start of Februrary, Alice assigns c_old := c_curr and c_all := c_all XOR c_old and c_curr := empty
      - During February, Alice mixes into c_curr the IDs of valid ATXs between February and March.
    - When Alice collects ATXs with timestamps from March 1st 2025 at 9:00am until April 1st 2025 at 9:00am:
      - At the start of March, Alice assigns c_old := c_curr and c_all := c_all XOR c_old and c_curr := empty
      - During March, Alice mixes into c_curr the IDs of valid ATXs between March and April.
    - When Alice collects ATXs with timestamps from April 1st 2025 at 9:00am until May 1st 2025 at 9:00am:
      - At the start of April, Alice assigns c_old := c_curr and c_all := c_all XOR c_old and c_curr := empty
      - During April, Alice mixes into c_curr the IDs of valid ATXs between April and May.
    - And so on…
Suppose again T=month and N_1=6 and N_2=3 and for example that Spacemesh has been running for approximately 5 years.
- Suppose that honest neighbors Alice and Bob disagree on c_all which is the cumulative fingerprint of all the ATXs (except for recent ATXs that were received since the start of the month).
- Alice and Bob check whether they disagree on the fingerprint of the last month by sending c_old to each other.
  - If they indeed discover that their c_old values differ then they sync the entire last month (by sending the ATX IDs list of that month)
    - Alice computes c_old_diff := c_old_bob excluding c_old_alice (that is, the XOR of all the ATXs IDs that Bob had in his c_old and Alice didn’t have in her c_old).
    - Bob computes c_old_diff := c_old_alice excluding c_old_bob
  - Alice and Bob now check whether they agree on c_all XOR c_old_diff
    - If yes, then the sync protocol terminates.
    - If no, then Alice and Bob compute c_old_2 which is the XOR of all the ATX IDs from one month before.
      - For example, if c_old was the fingerprint of all ATXs from July to June, then c_old_2 would be the fingerprint of all ATXs from June to May.
      - Similarly, Alice and Bob then compute c_old_2_diff, and repeat.
- If it turns out that they disagree on more than N_1 timeframes (6 months with the recommended parameters), then they switch to serve mode.
- if in the last N_2 rounds (c_old, c_old_2, c_old_3 assuming that N_2=3) it turned out that Alice always had all the ATXs is the list of IDs that Bob was sending, then she can still assume that Bob is honest but he was isolated
  - Alice will request Bob to stop the sync flow that Bob has been transmitting to her.
  - Alice will switch to sending the ATXs to Bob in “serve mode”, which she will throttle appropriately.

Alice uses only her own timestamps (as written above “whenever the honest node receives an ATX that has never been seen before, it’s timestamp will be the latest, and it will be combined into the fingerprint of the latest group”). It would defeat the entire purpose to use anything other than Alice’s local timestamps, because the dishonest neighbor can always lie via “injected” ATXs and thereby prolong the sync (for the contextually-valid ballots we don’t have this problem, as described in the OP).

Indeed this needs to be specified with further details, let’s break it down:

Optimistic fast detection
- As the first step when starting their sync, Alice and Bob will send to each other the fingerprint of c_all_high_grade which was computed (assuming that c_old hasn’t been xor’d into c_all yet) as c_all_high_grade := c_all XOR c_old_high_grade where c_old_high_grade are only the ATXs with grade 2 and grade 1 that were collected into c_old
- See Grading ATXs for the Active Set regarding the grades (with hare3 this gets tweaked a little because there are more than 3 grade).
- If c_all_high_grade is the same, then Alice and Bob terminate (there’s no need to sync).
- An adversary can cause Alice and Bob to disagree on c_all_high_grade (even when Alice and Bob are both honest), by broadcasting a late ATX that receives grade 0 by some honest miners (hence this adversarial ATX might not be rewarded).
- It might be possible to improve this fast detection further, by committing values of of c_all_high_grade fingerprints into the mesh consensus, but let’s not worry about it for now.
- This entire “fast detection” step can be skipped altogether, but it’s better ro keep it because our design should be the most efficient during the happy flow.
If the 1st step failed, then Alice and Bob freeze their c_curr, and send the list of ATX IDs in the frozen c_curr to each other.
- Alice and Bob compute c_curr_union := c_curr_alice union_xor c_curr_bob (that is, the XOR of all the ATXs IDs in the union of the c_curr_alice and c_curr_bob)
- Alice and Bob compute c_all_new = c_all XOR c_curr_union and sends c_all_new to each other.
- If Alice and Bob agree on c_all_new then they terminate (there’s no need to sync).
If the 2nd step failed, Alice and Bob continue to sync the last N_1 months
- In each of the N_1 iterations
  - Alice and Bob wil XOR each previous month’s diff into c_all_new
  - Alice and Bob will send c_all_new to each other, and terminate if they agree on it.
  - If Alice received nothing new during N_2 iterations, she switches to serve mode (as specified above).

Notes:

If Bob is dishonest, then he can come up with a syntactically valid ATX that Alice doesn’t have during each of the N_2 iterations, namely by taking these valid ATXs from the recent ATXs that were broadcasted after c_curr was frozen, and pretending that he received those ATXs a long time ago.
- The full sync protocol should add a punishment that drops/blacklists Bob upon detecting this malicious behavior (the PoET ticks of the ATXs that Bob lie about are too high, and an honest ATX that was broadcasted after the frozen c_curr is likely to have a positioning ATX that’s more recent than what Bob claimed).
- Even without punishment, such ATXs that Bob sends will be considered as “already seen” by Alice (if she received these ATXs after she froze the c_curr fingerprint that she’s using for sync’ing with Bob), so this can go on for at most N_2 iterations.
The happy flow (specifically when everyone is honest) is one of the most important aspects, and here the happy is Alice and Bob sending to each other just a single fingerprint value (that they already calculated before), and then terminate immedaitely.
- We don’t want to use any sync protocol in which the happy flow is more burdensome than this.

ivan4th:

Hence the ATX timestamps have the property of being self-propagating across the network, and that may indeed be a problem. Let’s call the following “ATX stashing attack”. Let’s say there’s an attacker who controls a number of nodes that produce ATXs but do not publish them immediately. Instead, they stash them and hold them say for 6 months. After that, they start publishing the stashed ATXs with some moderate cadence using 6-month old timestamps. Each node the attacker’s nodes sync with will have to resort to the serve mode, but that’s not the main problem. The main problem is that each node that receives such an old ATX gets “infected” and will trigger serve mode during sync with its peers, even while being completely honest. With the ATX stash being big enough, it’s basically possible to hold the network in “always serve” mode most of the time, reducing its efficiency. And throttling serve mode may even make things worse.

Regarding this scenario:

Suppose that Alice and Bob are honest, and Carol is dishonest.
Carol sync with Alice, and will transmit to Alice an ATX that Carol supposedly received more than 6 months ago.
- Alice will trigger “serve” mode with Carol, but as mentioned this is unavoiding (Carol can always pretend to be a newcomer that has no data whatsoever and ask to be served).
Alice stores according to her own local timestampt the ATX that’s supposedly more than 6 months old that Carol sent, so this ATX is stored in the latest c_curr checkpoint of Alice.
When Alice and Bob sync later, the supposedly 6-months old ATX will be sync’d during the first iteration.
- Therefore it will not trigger “serve” mode between the honest Alice and the honest Bob.

As mentioned earlier:

iddo:

The idea behind what we need to implement is that a dishonest node can alway say “I don’t have data” (i.e., pretend to be a newcomer), and ask to honest neighbor to serve the data, but when the honest node knows that the requested mode is “serve mode” then the honest node can throttle the data that’s being served, as written in the OP:

iddo:

** not allowing more than N serve requests simultaneously
** not allowing repeated serve requests from the same neighbor (determined according to the neighbor’s IP address)
** using less bandwidth and lower priority for “serve” requests, relative to other communication.

We don’t have to implement all of these throttle rules in the first version of the sync protocol, but the protocol itself needs to operates in the best way when these rules are added later (when the value of Spacemesh becomes greater so that dishonest network nodes will try to attack by clogging the network).

Topic		Replies	Views
Verifiable shuffle Research	0	164	November 6, 2023
Deferred Block Fetching (a.k.a. Dangling Pointers) Research	17	697	June 26, 2023
Compact encoding for hare Research hare	15	101	July 17, 2024
Hare equivocations defense Research	11	620	June 9, 2023
Achieving consensus on epoch participants collection Research	1	312	January 11, 2023

Sync/serve protocols

Related topics