Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-4772

If several processors do not return from their @OnScheduled method, NiFi will stop scheduling any Processors

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.6.0
    • Core Framework
    • None

    Description

      If a Processor does not properly return from its @OnScheduled method and several instances of the processor are started, we can get into a state where no Processors can start. We start seeing log messages like the following:

      2018-01-10 10:16:31,433 WARN [StandardProcessScheduler Thread-1] o.a.n.controller.StandardProcessorNode Timed out while waiting for OnScheduled of 'UpdateAttribute' processor to finish. An attempt is made to cancel the task via Thread.interrupt(). However it does not guarantee that the task will be canceled since the code inside current OnScheduled operation may have been written to ignore interrupts which may result in a runaway thread. This could lead to more issues, eventually requiring NiFi to be restarted. This is usually a bug in the target Processor 'UpdateAttribute[id=95423ee6-e6a6-1220-83ad-af20577063bd]' that needs to be documented, reported and eventually fixed.
      2018-01-10 10:16:42,937 WARN [StandardProcessScheduler Thread-2] o.a.n.controller.StandardProcessorNode Timed out while waiting for OnScheduled of 'PutHDFS' processor to finish. An attempt is made to cancel the task via Thread.interrupt(). However it does not guarantee that the task will be canceled since the code inside current OnScheduled operation may have been written to ignore interrupts which may result in a runaway thread. This could lead to more issues, eventually requiring NiFi to be restarted. This is usually a bug in the target Processor 'PutHDFS[id=25e531ec-d873-1dec-acc9-ea745e7869ed]' that needs to be documented, reported and eventually fixed.
      2018-01-10 10:16:46,993 WARN [StandardProcessScheduler Thread-4] o.a.n.controller.StandardProcessorNode Timed out while waiting for OnScheduled of 'LogAttribute' processor to finish. An attempt is made to cancel the task via Thread.interrupt(). However it does not guarantee that the task will be canceled since the code inside current OnScheduled operation may have been written to ignore interrupts which may result in a runaway thread. This could lead to more issues, eventually requiring NiFi to be restarted. This is usually a bug in the target Processor 'LogAttribute[id=9a683a06-aa24-19b5-ffff-ffff944a0216]' that needs to be documented, reported and eventually fixed.
      

      While we should avoid having misbehaving Processors to begin with, the framework must also be tolerant of this and should not allow one misbehaving Processor from affecting other Processors.

      We can "approximate" this issue by following these steps:
      1. Create 1 DebugFlow Processor. Auto-terminate its success & failure relationships. Set the "@OnScheduled Pause Time" property to "2 mins"
      2. Copy & paste this DebugFlow Processor so that there are at least 8 of them.
      3. Create a GenerateFlowFile Processor and an UpdateAttribute Processor. Send success of GenerateFlowFile to UpdateAttribute.
      4. Start all of the DebugFlow Processors.
      5. Start the GenerateFlowFIle and UpdateAttribute Processors.

      In this scenario, we will not see the above log messages, because after 1 minute the DebugFlow Processor is interrupted and the @OnSchedule method completes Exceptionally. However, we do see that GenerateFlowFile and UpdateAttribute do not start running until after the 2 minute time window has elapsed. If DebugFlow instead did not complete Exceptionally, then GenerateFlowFile and UpdateAttribute would never start running and we would see the above error messages in the log.

      Attachments

        Issue Links

          Activity

            People

              markap14 Mark Payne
              markap14 Mark Payne
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: