Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-10279

Increase timeouts in automated tests around NiFi Stateless

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.17.0
    • Core Framework, Extensions
    • None

    Description

      NiFi Stateless tends to be very fast. However, on constrained resources like Github Actions, the setup (such as unpacking nars) can take a while. This causes intermittent timeouts.

      For example, in the Stateless System Tests, several tests have a timeout of 10 seconds. This is generally plenty of time to run on a laptop that's not overly taxed. For example, on a 2019 Macbook Pro, the tests run in about 2 seconds for me. However, we had a period of about 1 week in which Github Actions was frequently failing with an error such as:

      Error:  org.apache.nifi.stateless.basics.AsyncCommitCallbackIT.testSynchronousCommitCleanupAfterFlowFilesTerminated  Time elapsed: 10.067 s  <<< ERROR!
      300 

      A timeout of 10 seconds is too restrictive for this environment.

      Similarly, the ExecuteStateless unit test can take a while when it must unpack NARs. We occasionally see the following error in the ci-workflow build due to a unit test failure:

      Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 83.578 s <<< FAILURE! - in org.apache.nifi.processors.stateless.TestExecuteStateless
      org.apache.nifi.processors.stateless.TestExecuteStateless.testRouteToFailureWithInput  Time elapsed: 72.185 s  <<< FAILURE!
      org.opentest4j.AssertionFailedError: expected: <3> but was: <1>
          at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
          at org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62)
          at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
          at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145)
          at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:527)
          at org.apache.nifi.util.StandardProcessorTestRunner.assertTransferCount(StandardProcessorTestRunner.java:360)
          at org.apache.nifi.processors.stateless.TestExecuteStateless.testRouteToFailureWithInput(TestExecuteStateless.java:104)
      ... 

      At first glance, it appears to indicate that data was improperly processed. But this occurs only occasionally. This appears to be due the fact that the ordering of tests is not guaranteed and can change from invocation to invocation. If this test (testRouteToFailureWithInput) is the first test to be run, this test can fail. This is because of this line of code in the test:

      runner.run(3, true, true, 60000L); 

      It specifies that the processor should run 3 iterations and times out after 60,000 milliseconds. If we analyze the log output, we see that indeed this was the first test run and this log line attracts a lot of attention:

      2022-07-26 03:24:38:527 +0900 [pool-2-thread-1] INFO org.apache.nifi.nar.NarUnpacker - NAR loading process took 51455278686 nanoseconds (51 seconds).

      So here we can see that unpacking NARs took 51 seconds. Indeed, updating the test to have a timeout of 100 milliseconds (a very short timeout that we expect to trigger) produces the exact same output for me locally.

      We need to use a longer timeout here, as well.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            markap14 Mark Payne
            markap14 Mark Payne
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m

                Slack

                  Issue deployment