Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
Description
If a Processor does not properly return from its @OnScheduled method and several instances of the processor are started, we can get into a state where no Processors can start. We start seeing log messages like the following:
2018-01-10 10:16:31,433 WARN [StandardProcessScheduler Thread-1] o.a.n.controller.StandardProcessorNode Timed out while waiting for OnScheduled of 'UpdateAttribute' processor to finish. An attempt is made to cancel the task via Thread.interrupt(). However it does not guarantee that the task will be canceled since the code inside current OnScheduled operation may have been written to ignore interrupts which may result in a runaway thread. This could lead to more issues, eventually requiring NiFi to be restarted. This is usually a bug in the target Processor 'UpdateAttribute[id=95423ee6-e6a6-1220-83ad-af20577063bd]' that needs to be documented, reported and eventually fixed. 2018-01-10 10:16:42,937 WARN [StandardProcessScheduler Thread-2] o.a.n.controller.StandardProcessorNode Timed out while waiting for OnScheduled of 'PutHDFS' processor to finish. An attempt is made to cancel the task via Thread.interrupt(). However it does not guarantee that the task will be canceled since the code inside current OnScheduled operation may have been written to ignore interrupts which may result in a runaway thread. This could lead to more issues, eventually requiring NiFi to be restarted. This is usually a bug in the target Processor 'PutHDFS[id=25e531ec-d873-1dec-acc9-ea745e7869ed]' that needs to be documented, reported and eventually fixed. 2018-01-10 10:16:46,993 WARN [StandardProcessScheduler Thread-4] o.a.n.controller.StandardProcessorNode Timed out while waiting for OnScheduled of 'LogAttribute' processor to finish. An attempt is made to cancel the task via Thread.interrupt(). However it does not guarantee that the task will be canceled since the code inside current OnScheduled operation may have been written to ignore interrupts which may result in a runaway thread. This could lead to more issues, eventually requiring NiFi to be restarted. This is usually a bug in the target Processor 'LogAttribute[id=9a683a06-aa24-19b5-ffff-ffff944a0216]' that needs to be documented, reported and eventually fixed.
While we should avoid having misbehaving Processors to begin with, the framework must also be tolerant of this and should not allow one misbehaving Processor from affecting other Processors.
We can "approximate" this issue by following these steps:
1. Create 1 DebugFlow Processor. Auto-terminate its success & failure relationships. Set the "@OnScheduled Pause Time" property to "2 mins"
2. Copy & paste this DebugFlow Processor so that there are at least 8 of them.
3. Create a GenerateFlowFile Processor and an UpdateAttribute Processor. Send success of GenerateFlowFile to UpdateAttribute.
4. Start all of the DebugFlow Processors.
5. Start the GenerateFlowFIle and UpdateAttribute Processors.
In this scenario, we will not see the above log messages, because after 1 minute the DebugFlow Processor is interrupted and the @OnSchedule method completes Exceptionally. However, we do see that GenerateFlowFile and UpdateAttribute do not start running until after the 2 minute time window has elapsed. If DebugFlow instead did not complete Exceptionally, then GenerateFlowFile and UpdateAttribute would never start running and we would see the above error messages in the log.
Attachments
Issue Links
- relates to
-
NIFI-4773 Database Fetch processor setup is incorrect
-
- Resolved
-
- links to