Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
When working on HDDS-5891, I found that TestOzoneRpcClientAbstract#testZReadKeyWithUnhealthyContainerReplia is markably slow.
And this is abstract test class is extended by three test classes, with one of test explicitly disabling this test case.
For instance, for me locally, the entire TestOzoneRpcClient took 2 min 15 sec to run, testZReadKeyWithUnhealthyContainerReplia alone took 2 min 9 sec. I assume it would take even longer to finish in GitHub Actions machines. Other 70+ test cases in this class mostly took tens of milliseconds to finish each.
In a 2 min 9 sec run, ~90 seconds are spent on waiting for DN to be stopped:
2021-11-09 18:03:57,217 [Time-limited test] INFO ozone.MiniOzoneClusterImpl (MiniOzoneClusterImpl.java:lambda$waitForHddsDatanodesStop$3(389)) - Waiting on 3 datanodes out of 2 to be marked unhealthy. ... 2021-11-09 18:05:28,622 [Time-limited test] INFO ozone.MiniOzoneClusterImpl (MiniOzoneClusterImpl.java:lambda$waitForHddsDatanodesStop$3(389)) - Waiting on 3 datanodes out of 2 to be marked unhealthy.
Changes:
1. Setting "ozone.scm.stale.node.interval" to 10s (TestReconTasks also did this) for the test alone reduced run time from 69s to 39s, saving 60s (x2 = 120s for both test classes).
2. Moving the extra 5000ms sleep length into GenericTestUtils.waitFor() saved another 5s
3. Typo fixed in this test case method name.