Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-7866

When cluster coordinator dies, other nodes may have trouble rejoining cluster

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.13.0
    • Core Framework
    • None

    Description

      When the cluster coordinator is lost, the nodes must now begin communicating with a newly elected Cluster Coordinator. This is handled through the StandardFlowService.

      When the `handleReconnectionRequest` method is called and the request provided does not contain the dataflow, the node is to connect to the cluster coordinator and request the dataflow:

      private void handleReconnectionRequest(final ReconnectionRequestMessage request) {
          try {
              logger.info("Processing reconnection request from cluster coordinator.");
      
              // reconnect
              ConnectionResponse connectionResponse = new ConnectionResponse(getNodeId(), request.getDataFlow(),
                      request.getInstanceId(), request.getNodeConnectionStatuses(), request.getComponentRevisions());
      
              if (connectionResponse.getDataFlow() == null) {
                  logger.info("Received a Reconnection Request that contained no DataFlow. Will attempt to connect to cluster using local flow.");
                  connectionResponse = connect(false, false, createDataFlowFromController());
              }
      
              loadFromConnectionResponse(connectionResponse);
      
      ... 

      However, if the call above to `connect(false, false, createDataFlowFromController()` returns false (which is a valid case), that null value is passed along to the loadFromConnectionResponse. This method expects a non-null connectionResponse and throws a NullPointerException, resulting in the following stack trace (stack trace based on nifi 1.11.4):

      2020-09-29 10:18:53,324 ERROR [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Handling reconnection request failed due to: org.apache.nifi.cluster.ConnectionException: Failed to connect node to cluster due to: java.lang.NullPointerExceptionorg.apache.nifi.cluster.ConnectionException: Failed to connect node to cluster due to: java.lang.NullPointerExceptionat org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1035)at org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:668)at org.apache.nifi.controller.StandardFlowService.access$200(StandardFlowService.java:109)at org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:415)at java.lang.Thread.run(Thread.java:748)Caused by: java.lang.NullPointerException: nullat org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:989)... 4 common frames omitted 

      This results in the node not reconnecting to the cluster.

      Attachments

        Issue Links

          Activity

            People

              markap14 Mark Payne
              markap14 Mark Payne
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: