Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
Normal
-
Clients
Description
When a node goes down, the other nodes learn that through the gossip.
And I do see the log from (Gossiper.java):
private void markDead(InetAddress addr, EndpointState localState) { if (logger.isTraceEnabled()) logger.trace("marking as down {}", addr); localState.markDead(); liveEndpoints.remove(addr); unreachableEndpoints.put(addr, System.nanoTime()); logger.info("InetAddress {} is now DOWN", addr); for (IEndpointStateChangeSubscriber subscriber : subscribers) subscriber.onDead(addr, localState); if (logger.isTraceEnabled()) logger.trace("Notified " + subscribers); }
Saying: "InetAddress 192.168.101.1 is now Down", in the Cassandra's system log.
Now on all the other nodes the client side (java driver) says, " Cannot connect to any host, scheduling retry in 1000 milliseconds". They eventually do reconnect but some queries fail during this intermediate period.
To me it seems like when the server pushes the nodeDown event, it call the getRpcAddress(endpoint), and thus sends localhost as the argument in the nodeDown event.
As in org.apache.cassandra.transport.Server.java
public void onDown(InetAddress endpoint)
{
server.connectionTracker.send(Event.StatusChange.nodeDown(getRpcAddress(endpoint), server.socket.getPort()));
}
the getRpcAddress returns localhost for any endpoint if the cassandra.yaml is using localhost as the configuration for rpc_address (which by the way is the default).
Attachments
Issue Links
- links to