Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-3880

Malformed Configuration Causes tservers To Shutdown

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.6.0, 1.6.1, 1.6.2, 1.7.0
    • 1.6.3, 1.7.0
    • tserver
    • None
    • HDP 2.2.7.0 to HDP 2.3.0.0 Upgrade

    Description

      During a rolling upgrade from HDP 2.2 to HDP 2.3, Accumulo tracer fails to start because it is unable to find any tabletservers. The tabletserver were updated to HDP 2.3 earlier in the upgrade process and did come online briefly.

      The PID file still exist, but the tservers are definitely down:

      [root@c6401 accumulo]# cat accumulo-accumulo-tserver.pid
      6075
      [root@c6401 accumulo]# ps -a | grep 6075
      

      It seems like the problem might be located in the following piece of code:

          private void checkPermission(TCredentials credentials, String lock, final String request) throws ThriftSecurityException {
            boolean fatal = false;
            try {
              log.trace("Got " + request + " message from user: " + credentials.getPrincipal());
              if (!security.canPerformSystemActions(credentials)) {
                log.warn("Got " + request + " message from user: " + credentials.getPrincipal());
                throw new ThriftSecurityException(credentials.getPrincipal(), SecurityErrorCode.PERMISSION_DENIED);
              }
            } catch (ThriftSecurityException e) {
              log.warn("Got " + request + " message from unauthenticatable user: " + e.getUser());
              if (getCredentials().getToken().getClass().getName().equals(credentials.getTokenClassName())) {
                log.error("Got message from a service with a mismatched configuration. Please ensure a compatible configuration.", e);
                fatal = true;
              }
              throw e;
            } finally {
              if (fatal) {
                Halt.halt(1, new Runnable() {
                  @Override
                  public void run() {
                    gcLogger.logGCInfo(TabletServer.this.getConfiguration());
                  }
                });
              }
            }
      

      Where a malformed principal causes a Halt.

      From the tserver logs:

      2015-06-01 19:25:30,462 [rpc.TServerUtils] DEBUG: Instantiating default, unsecure custom half-async Thrift server
      2015-06-01 19:25:30,468 [tserver.TabletServer] INFO : address = c6401.ambari.apache.org:9997
      2015-06-01 19:25:30,510 [tserver.TabletServer] INFO : Waiting for tablet server lock
      

      There is also no content in the *.out or *.err files for tserver.

      Attachments

        Issue Links

          Activity

            People

              elserj Josh Elser
              jonathanhurley Jonathan Hurley
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h