Waku v0.36 Discv5 analysis over time

Purpose: Examine Discv5 bandwidth usage over time in longer experiments. Typically we run shorter experiments. We noticed high Discv5 usage in waku nodes.

See reports from:

Problem

During the experiments for Waku regression testing v0.36 report (linked above), Discv5 bandwidth seemed high. We also see significant skew, with outliers handling most of the bandwidth.

Direct link to Discv5 section: Libp2p bandwidth vs Discv5 bandwidth:

On the grafana dashboards, it also seems that there is a discrepancy between Discv5 [In] and Discv5 [Out] bandwidth usage.

Findings

The discrepancy between in/out disappears when filtering bootstrap nodes.
The skewed distribution of bandwidth usage between nodes is a result of how we bring up the network and the fact that the network is sparse. Basically, the bootstrap node, as the first point of contact for relay nodes as they’re brought up, fills with the earlier nodes, and then gives those nodes to the new nodes that come up.
The NodeIds being generated follow the expected distribution: roughly uniform. Public secp256k1 are not exactly uniform.
As we increase the nodes, our median discv5 usage decreases. This is expected as more and more nodes are not being added to the main two buckets.
We should probably be looking at totals instead of just averages for Discv5 (and other stats)
Total Discv5 is not too high. It looks like it’s in range if we extrapolate from 30 nodes.

Oddities

Nodes appeared in buckets: 38,39,40 based on the public keys scraped from an experiment, but the theoretical expected would be in buckets 49, 50, 51.
Churn doesn’t seem to have much of an effect on Discv5 usage