Status-backend chat protocol benchmarks

Document with tasks requested and explained, please re-read to have full context: Chat Protocol Benchmarks

Status backend commit hash: e6bfc779

Service nodes nwaku version: v0.34.0-rc1

Results summary:

Scalability has been an issue on every set of experiments that has been done.

There are two types of scenarios:

Deployment goes OK:
1. Friend/community requests are correctly sent and accepted.
2. Messages are sent and received by all nodes.
3. One to one messages don’t appear in other nodes.
4. Etc
Deployment fails:
1. Friend/community request are lost, so the experiment needs to be restarted.

The second scenario starts to appear when we have 25~40 status-desktop nodes, and the requests start to get very inconsistent.

This was further explored in Subscription Performance.

We saw that the time it took a node to start receiving messages after doing a login was incrementing with respect to the number of nodes ().

We think this is also related to the scenarios where the higher the number of nodes, the more difficult it is to accept community/friend requests. Status-backend does a high usage of the store node with a lot of requests, this could lead to problems like:

Saturating the store with requests, making responses slow.
Saturating the store with requests, disconnecting nodes from it.

This would lead to obtaining the following responses when trying to accept community/friend requests:

<aside> 💡

{'code': -32000, 'message': 'record not found'} or ‘chat not found’

</aside>

In Store Performance specifically, there were times that when nodes did log in, some of them were not receiving messages from the store. These experiments were discarded and repeated to obtain results, but this issue happened both in relay and light nodes.

We are pretty sure this is not an issue with the lab capacity, since the number of nodes and the metrics of the lab shows a low usage. Also, there are experiments that don’t contain message injection, yet the problem of requests still persist. But once we have message injection and the bandwidth usage is higher, the messages are delivered without problems.