Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-10070

NiFi fails to delete/update component because it's still running, immediately after confirming that the component is stopped.

    XMLWordPrintableJSON

Details

    Description

      This issue has been identified by analyzing the logs, code, etc., of the system tests. Many of the system tests indicate that after each test (or after a set of tests), the flow must be torn down. This will stop all processors/reporting tasks and disable all controller services. It will then wait for them to fully stop/disable, according to the REST API. It will then purge any queues and delete all components. Then it deletes all components.

      However, occasionally we see a failure in the step that deletes the components. One node will indicate that the component cannot be deleted because it's still running, so the REST API will send back a 409. However, before making this request, we've already made a request to get all components and checked that their state is STOPPED/DISABLED and no active threads.

      If we look at the code that is used to determine whether or not they are STOPPED/DISABLED, it is using the "status" field in the Entity objects ( reportingTaskEntity.getStatus().getRunStatus() for example).

      However, the DTO also has a state field: ReportingTaskDTO.getState()

      We have a similar situation with Processors, Reporting Tasks, and Controller Services.

      In order to maintain backward compatibility, we need to leave both of these fields. However, the issue we have appears to be in the ReportingTaskEntityMerger, ProcessorEntityMerger, and ControllerServiceEntityMerger.

      These mergers do not take into account / merge this status field in the Entity. They take into account only the fields in the DTO. As a result, we can have one node indicating that the status is STOPPED with 0 threads while another node indicates STOPPED with 1 thread. The merging logic may choose the STOPPED with 0 threads, confirming that the component is fully stopped. At this point, a delete or update will fail because the component is not in the desired state on all nodes.

      We need to update the 3 Entity Mergers to ensure that they properly merge the state in the Entity objects as well.

      Attachments

        Issue Links

          Activity

            People

              thenatog Nathan Gough
              markap14 Mark Payne
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 20m
                  3h 20m