Issue:

https://github.com/waku-org/nwaku/issues/2921#issuecomment-2239277713

TL;DR:

Analyzing Discv5 and the time it takes to find other peers, in some of the simulations we observed that some messages were being delivered twice (more info: https://www.notion.so/Message-hash-duplication-d59f6133a2e341398064562d7a4c74f2).

In order to further investigate this, we set up the node logs from INFO to TRACE. Then, we started seeing another issue. This time, some nodes were losing messages.

We were able to duplicate this issue in nWaku v0.30 and v0.31. Trying to further analyze this, we end up discovering that the issue of messages being delivered only happens when the logs are in TRACE MODE.

While investigating this issue, we discovered a leak in Yamux: Yamux issue (Solved). We are still not sure if it is related, but as this also happens in mplex, and as far as we investigated, this leak issue is not part of mplex.

Extended report

In order to analyze if all nodes received all messages, we are using an image that logs the received and sent relay message events. The images are based on waku v0.30 and waku v0.31:

Over these images, we are only adding jq and curl for some checks we need to do from Kubernetes. This is why we don’t directly use the images from harbor/wakuorg.

We are deploying 3 bootstrap nodes and 100 waku nodes. Bootstrap nodes have relay disabled, as they only serve as bootstraping for discv5.