Accumulo
  1. Accumulo
  2. ACCUMULO-2716

Duplicate connection loss logging in Writer

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.1
    • Fix Version/s: 1.5.2, 1.6.0
    • Component/s: client
    • Labels:

      Description

      Running CI with agitation, I see lots of duplicated messages in the monitor whenever a tserver dies.

      WARN Error connecting to tserver1.example.com:10011: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused
      ERROR error sending update to tserver1.example.com:10011: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused

      These always occur in pairs, at the same millisecond, and coming from the same tserver. I think that they are updates to the metadata table coming from these tservers, like flushes or compactions that fail because the dead server was hosting the corresponding metadata tablet, but it doesn't really matter.

      The culprit is in Writer.java where we log-and-rethrow in updateServer():

          } catch (TTransportException e) {
            log.warn("Error connecting to " + server + ": " + e);
            throw e;
          }
      

      and then later log again in update():

            } catch (TException e) {
              log.error("error sending update to " + tabLoc.tablet_location + ": " + e);
              TabletLocator.getLocator(instance, table).invalidateCache(tabLoc.tablet_extent);
            }
      

        Activity

        Mike Drob created issue -
        Mike Drob made changes -
        Field Original Value New Value
        Description Running CI with agitation, I see lots of duplicated messages in the monitor whenever a tserver dies.

        | WARN | Error connecting to tserver1.example.com:10011: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused |
        | ERROR | error sending update to a2422.halxg.cloudera.com:10011: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused |

        These always occur in pairs, at the same millisecond, and coming from the same tserver. I _think_ that they are updates to the metadata table coming from these tservers, like flushes or compactions that fail because the dead server was hosting the corresponding metadata tablet, but it doesn't really matter.

        The culprit is in Writer.java where we log-and-rethrow in {{updateServer()}}:
        {code}
            } catch (TTransportException e) {
              log.warn("Error connecting to " + server + ": " + e);
              throw e;
            }
        {code}

        and then later log again in {{update()}}:
        {code}
              } catch (TException e) {
                log.error("error sending update to " + tabLoc.tablet_location + ": " + e);
                TabletLocator.getLocator(instance, table).invalidateCache(tabLoc.tablet_extent);
              }
        {code}

        Running CI with agitation, I see lots of duplicated messages in the monitor whenever a tserver dies.

        | WARN | Error connecting to tserver1.example.com:10011: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused |
        | ERROR | error sending update to tserver1.example.com:10011: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused |

        These always occur in pairs, at the same millisecond, and coming from the same tserver. I _think_ that they are updates to the metadata table coming from these tservers, like flushes or compactions that fail because the dead server was hosting the corresponding metadata tablet, but it doesn't really matter.

        The culprit is in Writer.java where we log-and-rethrow in {{updateServer()}}:
        {code}
            } catch (TTransportException e) {
              log.warn("Error connecting to " + server + ": " + e);
              throw e;
            }
        {code}

        and then later log again in {{update()}}:
        {code}
              } catch (TException e) {
                log.error("error sending update to " + tabLoc.tablet_location + ": " + e);
                TabletLocator.getLocator(instance, table).invalidateCache(tabLoc.tablet_extent);
              }
        {code}

        Mike Drob made changes -
        Affects Version/s 1.5.1 [ 12324399 ]
        Mike Drob made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 1.5.2 [ 12326272 ]
        Fix Version/s 1.6.1 [ 12325441 ]
        Fix Version/s 1.7.0 [ 12324607 ]
        Resolution Fixed [ 1 ]
        Christopher Tubbs made changes -
        Fix Version/s 1.6.0 [ 12322468 ]
        Fix Version/s 1.7.0 [ 12324607 ]
        Fix Version/s 1.6.1 [ 12325441 ]

          People

          • Assignee:
            Mike Drob
            Reporter:
            Mike Drob
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development