Details
-
Task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
NiFi Stateless tends to be very fast. However, on constrained resources like Github Actions, the setup (such as unpacking nars) can take a while. This causes intermittent timeouts.
For example, in the Stateless System Tests, several tests have a timeout of 10 seconds. This is generally plenty of time to run on a laptop that's not overly taxed. For example, on a 2019 Macbook Pro, the tests run in about 2 seconds for me. However, we had a period of about 1 week in which Github Actions was frequently failing with an error such as:
Error: org.apache.nifi.stateless.basics.AsyncCommitCallbackIT.testSynchronousCommitCleanupAfterFlowFilesTerminated Time elapsed: 10.067 s <<< ERROR! 300
A timeout of 10 seconds is too restrictive for this environment.
Similarly, the ExecuteStateless unit test can take a while when it must unpack NARs. We occasionally see the following error in the ci-workflow build due to a unit test failure:
Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 83.578 s <<< FAILURE! - in org.apache.nifi.processors.stateless.TestExecuteStateless org.apache.nifi.processors.stateless.TestExecuteStateless.testRouteToFailureWithInput Time elapsed: 72.185 s <<< FAILURE! org.opentest4j.AssertionFailedError: expected: <3> but was: <1> at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55) at org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62) at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150) at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145) at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:527) at org.apache.nifi.util.StandardProcessorTestRunner.assertTransferCount(StandardProcessorTestRunner.java:360) at org.apache.nifi.processors.stateless.TestExecuteStateless.testRouteToFailureWithInput(TestExecuteStateless.java:104) ...
At first glance, it appears to indicate that data was improperly processed. But this occurs only occasionally. This appears to be due the fact that the ordering of tests is not guaranteed and can change from invocation to invocation. If this test (testRouteToFailureWithInput) is the first test to be run, this test can fail. This is because of this line of code in the test:
runner.run(3, true, true, 60000L);
It specifies that the processor should run 3 iterations and times out after 60,000 milliseconds. If we analyze the log output, we see that indeed this was the first test run and this log line attracts a lot of attention:
2022-07-26 03:24:38:527 +0900 [pool-2-thread-1] INFO org.apache.nifi.nar.NarUnpacker - NAR loading process took 51455278686 nanoseconds (51 seconds).
So here we can see that unpacking NARs took 51 seconds. Indeed, updating the test to have a timeout of 100 milliseconds (a very short timeout that we expect to trigger) produces the exact same output for me locally.
We need to use a longer timeout here, as well.
Attachments
Issue Links
- links to