Issue:
https://github.com/waku-org/nwaku/issues/2921#issuecomment-2239277713
Analyzing Discv5 and the time it takes to find other peers, in some of the simulations we observed that some messages were being delivered twice (more info: https://www.notion.so/Message-hash-duplication-d59f6133a2e341398064562d7a4c74f2).
In order to further investigate this, we set up the node logs from INFO to TRACE. Then, we started seeing another issue. This time, some nodes were losing messages.
We were able to duplicate this issue in nWaku v0.30
and v0.31
. Trying to further analyze this, we end up discovering that the issue of messages being delivered only happens when the logs are in TRACE MODE.
While investigating this issue, we discovered a leak in Yamux: Yamux issue (Solved). We are still not sure if it is related, but as this also happens in mplex, and as far as we investigated, this leak issue is not part of mplex.
In order to analyze if all nodes received all messages, we are using an image that logs the received and sent relay message events. The images are based on waku v0.30 and waku v0.31:
soutullostatus/nwaku-jq-curl:v0.30.1-msg-log-final-short
(git commit hash: v0.30.1-9-g96d0b4). Message sent/received logging over 0.30.1 with short peer_ids.
Timestamp: [2024-07-23T15:40:00, 2024-07-23T15:47:00]
soutullostatus/nwaku-jq-curl:v0.31.0-msg-log
(git commit hash: v0.31.0-rc.1-11-g12db97). Message sent/received logging over 0.31.0 with short peer_ids
soutullostatus/nwaku-jq-curl:v0.31.0-msg-log-extra-2
(git commit hash: v0.31.0-rc.1-13-gc716d4)
[2024-07-26T06:51:00, 2024-07-26T06:59:00]
[mplex version failing]
Over these images, we are only adding jq
and curl
for some checks we need to do from Kubernetes. This is why we don’t directly use the images from harbor/wakuorg.
We are deploying 3 bootstrap nodes and 100 waku nodes. Bootstrap nodes have relay disabled, as they only serve as bootstraping for discv5.