Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-18096

Do not spam the logs with MigrationCoordinator not able to pull schemas

    XMLWordPrintableJSON

Details

    Description

      When a node is joining a cluster, there is this output upon startup:

      cassandra_node_6  | INFO  [GossipStage:1] 2022-12-06 12:48:07,187 Gossiper.java:1413 - Node /172.19.0.5:7000 is now part of the cluster
      cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema pull request: node /172.19.0.5:7000 is down.
      cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema pull request: node /172.19.0.5:7000 is down.
      cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema pull request: node /172.19.0.5:7000 is down.
      cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema pull request: node /172.19.0.5:7000 is down.
      cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema pull request: node /172.19.0.5:7000 is down.
      cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema pull request: node /172.19.0.5:7000 is down.
      cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema pull request: node /172.19.0.5:7000 is down.
      

      This is there for a lot of already existing nodes. You got the idea. This log is misleading, it indeed can not pull requests because "node is down" but it is not down, it just thinks it is because Gossiper has not had a chance to receive any gossip about these nodes yet.

      I put there more logs and it writes this:

       MigrationCoordinator.java:655 - Can't send schema pull request: node /172.19.0.5:7000 is down: NORMAL, isAlive: false
      

      When I do this:

              if (!gossiper.hasEndpointState(endpoint))
                  return;
      
              if (!gossiper.isAlive(endpoint))
              {
                  EndpointState endpointStateForEndpoint = gossiper.getEndpointStateForEndpoint(endpoint);
                  String status = Gossiper.getGossipStatus(endpointStateForEndpoint);
                  logger.warn("Can't send schema pull request: node {} is down: {}, isAlive: {}", endpoint, status, endpointStateForEndpoint.isAlive());
                  callback.onFailure(endpoint, RequestFailureReason.UNKNOWN);
                  return;
              }
      

      So it is in NORMAL but it is not alive yet which is quite strange.

      The fix is to still return prematurely but we would not skip the logging on WARN only in case isAlive is false and status is _not_NORMAL. We would however still log on TRACE at least.

      (1) https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/schema/MigrationCoordinator.java#L648-L653

      Attachments

        Activity

          People

            smiklosovic Stefan Miklosovic
            smiklosovic Stefan Miklosovic
            Stefan Miklosovic
            Michael Semb Wever
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 40m
                40m