Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-6849

On startup, NiFi should be more liberal about what it's willing to inherit from cluster

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.12.0
    • Core Framework
    • None

    Description

      On startup, if an instance is configured to be clustered, the instance must connect to the Cluster Coordinator and download the existing cluster flow, access policies, and users and groups. The instance then performs some checks to ensure that the local flow matches the cluster's flow. If it doesn't, then the node fails to startup and logs errors that the local flow is different than the cluster's flow.

      This was done in order to facilitate debugging. If a particular node is not behaving as expected, a user is able to disconnect the node from the cluster and make modifications to the node. If the node is then restarted, it may not be desirable to lose those changes.

      However, in the vast majority of cases (probably over 98% of the time), what the user really wants is for the node to just join back to the cluster and inherit the cluster's flow - especially if the node just disconnected because it failed to make a modification. This is also problematic with how the Users, Groups, and Policies are inherited.

      As a result, we should make the following modifications:
      1) If Users, Groups, or Access Policies cannot be inherited, continue to fail, unless the flow is empty. If the flow is empty, it doesn't really make sense to retain the authorizations' configuration because they don't really apply to anything. As a result, if the flow is empty, just inherit whatever the cluster has. But first, make a backup of the existing policies, users, and groups, so that users can manually revert if they do end up needing to.

      2) If the flow differs from the cluster flow, check the proposed flow to see if it removes any existing connections. If it does remove a connection, and that connection has data queued locally, continue to fail. Otherwise, create a backup of the flow and replace the node's flow with the cluster flow.

      Attachments

        Issue Links

          Activity

            People

              markap14 Mark Payne
              markap14 Mark Payne
              Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 40m
                  3h 40m