While running dtests on 32 cores and 64 GB of memory on c5.9xlarge some tests are failing because they are not able to handle the stress cassandra-stress is generating for them.
For all examples, there is e.g. this one (1) where we test that a cluster is able to cope with a boostrapping node. The problem is that node1 is bombed with cassandra-stress and it is eventually killed and test fails as such before even proceeding to test itself.
It was said to me that dtests in circleci are running in containers with 8 cores and 16GB or RAM and I simulated this on my machine (-Dcassandra.available_processors=8). The core problem is that nodes do not have enough memory - Xmx and Xms is set to only 512MB and that is very low figure and they are eventually killed.
1) Run dtests on less powerful machines so it can not handle stress high enough so underlying nodes would be killed (rather strange idea)
2) Increase memory for node - this should be configurable, I saw that 1GB helps but there are still some timeouts, 2GB is better. 4GB would be the best.
3) Fix the test in such way it does not fail with 512MB.
1) is not viable to me, 3) takes a lot of time to go through and does not actually solve anything and it would be very cumbersome and clunky to go through all tests to set them like that. 2) seems to be the best approach but there is not any way I am aware of how to add more memory to every node all at once as node and cluster start / creation is scattered all over the project.
I have raised the issue here (2) too.
Do you guys think that if we manage to somehow fix this in CCM, we could introduce some switch / flag to dtests as how much memory a node in a cluster should run with?