Description
Running CI with agitation, I see lots of duplicated messages in the monitor whenever a tserver dies.
WARN | Error connecting to tserver1.example.com:10011: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused |
ERROR | error sending update to tserver1.example.com:10011: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused |
These always occur in pairs, at the same millisecond, and coming from the same tserver. I think that they are updates to the metadata table coming from these tservers, like flushes or compactions that fail because the dead server was hosting the corresponding metadata tablet, but it doesn't really matter.
The culprit is in Writer.java where we log-and-rethrow in updateServer():
} catch (TTransportException e) { log.warn("Error connecting to " + server + ": " + e); throw e; }
and then later log again in update():
} catch (TException e) { log.error("error sending update to " + tabLoc.tablet_location + ": " + e); TabletLocator.getLocator(instance, table).invalidateCache(tabLoc.tablet_extent); }