Overview
This is a test plan to verify the store sync protocol behaviour and identify its functional limits in various scenarios. The store sync protocol has been implemented to improve reliability in the network by attempting to keep the messages in store node DBs synchronised. Store nodes could miss messages for a variety of reasons and the node might not be aware it has missed any messages. The store sync protocol enables the nodes to compare a subset of its recent message hashes with that of other nodes and identify possible missing message hashes.
Use cases
- UC1: No Missing Messages: All store nodes receive and archive the same relay messages within the store-sync-interval. The results of the synchronisation process is that no nodes have missing message hashes.
- UC2: Node Misses Some Messages While Online: A node has been online but missed some message for some reason. The synchronisation process identifies that there are missing message hashes.
- UC3: Node Goes Offline and Misses Messages: A node goes offline and misses messages. When it comes back online the synchronisation process identifies that there are missing message hashes.
- UC4: Majority of Nodes Miss Messages: Most nodes go offline and miss messages. The synchronisation process for each node should identify the missing messages eventually.
Some general questions to answer
- How does increased relay message rate impact store sync
- How does increased query rate of store nodes impact store sync (DB too busy processing queries to respond to store sync queries)
- What is the duration to sync all store nodes in various scenarios
- Edge cases where syncing occurs before all new messages have been relayed to the syncing node
- Is the 20s jitter offset for the sync range sufficient?
- What is the optimal number of peers to sync with to ensure missed messages are identified and synced?
- What is the resource utilization of sync protocol in terms of disk, CPU and memory?
Metrics that may be useful to answer above questions