Accumulo
  1. Accumulo
  2. ACCUMULO-2964

Unexpected ThriftSecurityException from BatchScanner

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: client, tserver
    • Labels:
      None

      Description

      This is something I've only seen a handful of times when writing/running tests that stop and restart tservers. After the tserver is restarted, there is a thread (typically running in the master) which is trying to read a table. As such, the thread will continue to poll until the tserver comes up.

      Very infrequently, the client gets a ThriftSecurityException with a code of DEFAULT_SECURITY_ERROR and a message of Unknown security exception. There is no additional information in the client log (from the thrift call inside the batchscanner), and the tserver contains no error messages at all.

      The error that the client saw.

      2014-07-01 04:18:18,971 [impl.TabletServerBatchReaderIterator] DEBUG: Server : host:58090 msg : null
      ThriftSecurityException(user:!SYSTEM, code:null)
              at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$startMultiScan_result$startMultiScan_resultStandardScheme.read(TabletClientService.java:10045)
              at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$startMultiScan_result$startMultiScan_resultStandardScheme.read(TabletClientService.java:10022)
              at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$startMultiScan_result.read(TabletClientService.java:9961)
              at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
              at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:313)
              at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:293)
              at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:632)
              at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:592)
              at org.apache.accumulo.core.metadata.MetadataLocationObtainer.lookupTablets(MetadataLocationObtainer.java:181)
              at org.apache.accumulo.core.client.impl.TabletLocatorImpl.processInvalidated(TabletLocatorImpl.java:667)
              at org.apache.accumulo.core.client.impl.TabletLocatorImpl.binRanges(TabletLocatorImpl.java:337)
              at org.apache.accumulo.core.client.impl.TabletLocatorImpl.processInvalidated(TabletLocatorImpl.java:660)
              at org.apache.accumulo.core.client.impl.TabletLocatorImpl._locateTablet(TabletLocatorImpl.java:610)
              at org.apache.accumulo.core.client.impl.TabletLocatorImpl.locateTablet(TabletLocatorImpl.java:440)
              at org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:226)
              at org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:84)
              at org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:177)
              at org.apache.accumulo.master.replication.DistributedWorkQueueWorkAssigner.createWork(DistributedWorkQueueWorkAssigner.java:161)
              at org.apache.accumulo.master.replication.DistributedWorkQueueWorkAssigner.assignWork(DistributedWorkQueueWorkAssigner.java:140)
              at org.apache.accumulo.master.replication.WorkDriver.run(WorkDriver.java:97)
      

      The interesting part is that when the client saw this message, the new TabletServer was already started, and the old tabletserver appears to have been dead for 20s. So, the client in the master had been polling for 20s getting a ConnectException (connection refused) which is expected. I don't know why we got this exception after a length of time.

      The infrequency in which I see this makes me wonder if the random ports in the new tabletserver are somehow re-grabbing the old tserver's thrift client service port and something is unexpectedly being interpreted as this ThriftSecurityException? That's the only thing that seems remotely possible to me.

        Issue Links

          Activity

          Josh Elser made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 1.7.0 [ 12324607 ]
          Fix Version/s 1.6.3 [ 12329154 ]
          Resolution Cannot Reproduce [ 5 ]
          Corey J. Nolet made changes -
          Fix Version/s 1.6.3 [ 12329154 ]
          Fix Version/s 1.6.2 [ 12328644 ]
          Josh Elser made changes -
          Link This issue relates to ACCUMULO-2746 [ ACCUMULO-2746 ]
          Corey J. Nolet made changes -
          Fix Version/s 1.6.2 [ 12328644 ]
          Fix Version/s 1.6.1 [ 12325441 ]
          ASF subversion and git services made changes -
          Time Spent 10m [ 600 ]
          Worklog Id 18116 [ 18116 ]
          Remaining Estimate 0h [ 0 ]
          Josh Elser made changes -
          Priority Critical [ 2 ] Major [ 3 ]
          Josh Elser made changes -
          Fix Version/s 1.6.1 [ 12325441 ]
          Josh Elser made changes -
          Priority Minor [ 4 ] Critical [ 2 ]
          Josh Elser made changes -
          Field Original Value New Value
          Link This issue is related to ACCUMULO-2963 [ ACCUMULO-2963 ]
          Josh Elser created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              Josh Elser
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 10m
                10m

                  Development