Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-3880

Malformed Configuration Causes tservers To Shutdown

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.6.0, 1.6.1, 1.6.2, 1.7.0
    • Fix Version/s: 1.6.3, 1.7.0
    • Component/s: tserver
    • Labels:
      None
    • Environment:

      HDP 2.2.7.0 to HDP 2.3.0.0 Upgrade

      Description

      During a rolling upgrade from HDP 2.2 to HDP 2.3, Accumulo tracer fails to start because it is unable to find any tabletservers. The tabletserver were updated to HDP 2.3 earlier in the upgrade process and did come online briefly.

      The PID file still exist, but the tservers are definitely down:

      [root@c6401 accumulo]# cat accumulo-accumulo-tserver.pid
      6075
      [root@c6401 accumulo]# ps -a | grep 6075
      

      It seems like the problem might be located in the following piece of code:

          private void checkPermission(TCredentials credentials, String lock, final String request) throws ThriftSecurityException {
            boolean fatal = false;
            try {
              log.trace("Got " + request + " message from user: " + credentials.getPrincipal());
              if (!security.canPerformSystemActions(credentials)) {
                log.warn("Got " + request + " message from user: " + credentials.getPrincipal());
                throw new ThriftSecurityException(credentials.getPrincipal(), SecurityErrorCode.PERMISSION_DENIED);
              }
            } catch (ThriftSecurityException e) {
              log.warn("Got " + request + " message from unauthenticatable user: " + e.getUser());
              if (getCredentials().getToken().getClass().getName().equals(credentials.getTokenClassName())) {
                log.error("Got message from a service with a mismatched configuration. Please ensure a compatible configuration.", e);
                fatal = true;
              }
              throw e;
            } finally {
              if (fatal) {
                Halt.halt(1, new Runnable() {
                  @Override
                  public void run() {
                    gcLogger.logGCInfo(TabletServer.this.getConfiguration());
                  }
                });
              }
            }
      

      Where a malformed principal causes a Halt.

      From the tserver logs:

      2015-06-01 19:25:30,462 [rpc.TServerUtils] DEBUG: Instantiating default, unsecure custom half-async Thrift server
      2015-06-01 19:25:30,468 [tserver.TabletServer] INFO : address = c6401.ambari.apache.org:9997
      2015-06-01 19:25:30,510 [tserver.TabletServer] INFO : Waiting for tablet server lock
      

      There is also no content in the *.out or *.err files for tserver.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                elserj Josh Elser
                Reporter:
                jonathanhurley Jonathan Hurley
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h