[CASSANDRA-18096] Do not spam the logs with MigrationCoordinator not able to pull schemas - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Low
Resolution: Fixed
Fix Version/s: 3.0.29, 3.11.15, 4.0.8, 4.1.1, 5.0-alpha1, 5.0
Component/s: Cluster/Schema
Labels:
None

Change Category:
Operability
Complexity:
Low Hanging Fruit
Platform:

All
Impacts:

None
Source Control Link:

https://github.com/apache/cassandra/commit/f55b2fb1b3a8af03a3424abfe7afa2ac60ac9dff
Test and Documentation Plan:

Hide

ci passing

Show
ci passing

Description

When a node is joining a cluster, there is this output upon startup:

cassandra_node_6  | INFO  [GossipStage:1] 2022-12-06 12:48:07,187 Gossiper.java:1413 - Node /172.19.0.5:7000 is now part of the cluster
cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema pull request: node /172.19.0.5:7000 is down.
cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema pull request: node /172.19.0.5:7000 is down.
cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema pull request: node /172.19.0.5:7000 is down.
cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema pull request: node /172.19.0.5:7000 is down.
cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema pull request: node /172.19.0.5:7000 is down.
cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema pull request: node /172.19.0.5:7000 is down.
cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema pull request: node /172.19.0.5:7000 is down.

This is there for a lot of already existing nodes. You got the idea. This log is misleading, it indeed can not pull requests because "node is down" but it is not down, it just thinks it is because Gossiper has not had a chance to receive any gossip about these nodes yet.

I put there more logs and it writes this:

 MigrationCoordinator.java:655 - Can't send schema pull request: node /172.19.0.5:7000 is down: NORMAL, isAlive: false

When I do this:

        if (!gossiper.hasEndpointState(endpoint))
            return;

        if (!gossiper.isAlive(endpoint))
        {
            EndpointState endpointStateForEndpoint = gossiper.getEndpointStateForEndpoint(endpoint);
            String status = Gossiper.getGossipStatus(endpointStateForEndpoint);
            logger.warn("Can't send schema pull request: node {} is down: {}, isAlive: {}", endpoint, status, endpointStateForEndpoint.isAlive());
            callback.onFailure(endpoint, RequestFailureReason.UNKNOWN);
            return;
        }

So it is in NORMAL but it is not alive yet which is quite strange.

The fix is to still return prematurely but we would not skip the logging on WARN only in case isAlive is false and status is _not_NORMAL. We would however still log on TRACE at least.

(1) https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/schema/MigrationCoordinator.java#L648-L653

Attachments

Activity

People

Assignee:: Stefan Miklosovic

Reporter:: Stefan Miklosovic

Authors:: Stefan Miklosovic

Reviewers:: Michael Semb Wever

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 06/Dec/22 12:53

Updated:: 12/Sep/23 13:03

Resolved:: 20/Dec/22 10:21

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

40m