Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-2408

metadata table not assigned after root table is loaded

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.5.0, 1.5.1
    • Fix Version/s: 1.4.5, 1.5.2, 1.6.0
    • Component/s: master
    • Labels:

      Description

      During a nightly integration test run, BigRootTableIT failed, timing out after 4 minutes:

      java.lang.Exception: test timed out after 240000 milliseconds
      	at sun.misc.Unsafe.park(Native Method)
      	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
      	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
      	at org.apache.accumulo.core.client.admin.TableOperationsImpl.addSplits(TableOperationsImpl.java:437)
      	at org.apache.accumulo.test.functional.BigRootTabletIT.test(BigRootTabletIT.java:50)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
      

      Looking at the logs, the root tablet is assigned successfully:

      2014-02-26 05:17:09,414 [state.ZooTabletStateStore] DEBUG: Returning root tablet state: +r<<@(tserver1:9997[1446db2884a0002],null,null)
      2014-02-26 05:17:09,596 [master.EventCoordinator] INFO : tablet +r<< was loaded on tserver1:9997
      

      No other tablets are assigned for the next four minutes.

      The logs are full of "Failed to bin" errors:

      2014-02-26 05:19:09,613 [impl.ThriftTransportPool] TRACE: Using existing connection to tserver1:9997
      2014-02-26 05:19:09,615 [impl.ThriftTransportPool] TRACE: Returned connection tserver1:9997 (120000) ioCount : 562
      2014-02-26 05:19:09,615 [metadata.MetadataLocationObtainer] TRACE: tid=28 oid=3448  Got 2 results  from +r<< in 0.002 secs
      2014-02-26 05:19:09,615 [impl.TabletLocatorImpl] TRACE: tid=28 oid=3446  Binned 1 ranges for table !0 to 0 tservers in 0.003 secs
      2014-02-26 05:19:09,616 [impl.TabletServerBatchReaderIterator] TRACE: Failed to bin 1 ranges, tablet locations were null, retrying in 100ms
      

      There is an IOException, trying to do a batch read

      2014-02-26 05:19:09,687 [impl.TabletServerBatchReaderIterator] DEBUG: Server : tserver1:9997 msg : java.net.SocketTimeoutException: 120000 millis timeout while
       waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997]
      2014-02-26 05:19:09,689 [impl.TabletServerBatchReaderIterator] DEBUG: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: 120000 millis timeout while waiting
       for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997]
      java.io.IOException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.
      channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997]
              at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:713)
              at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:372)
              at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
              at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
              at java.lang.Thread.run(Thread.java:744)
      Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997]
              at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
              at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
              at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
              at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
              at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
              at org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
              at org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601)
              at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470)
              at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
              at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:311)
              at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:291)
              at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:658)
              ... 7 more
      Caused by: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997]
              at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
              at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
              at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
              at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
              at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
              at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
              at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
              ... 18 more
      2014-02-26 05:19:09,693 [impl.TabletServerBatchReaderIterator] TRACE: Failed to execute multiscans against 1 tablets, retrying...
      

      This would appear to be the batch scanner used to read the root table in the master.

      The tablet server hosting the root tablet is being successfully scanned more that 24x a second, presumably from clients.

      There are no errors in the tserver logs.

        Issue Links

          Activity

          Hide
          ecn Eric Newton added a comment -

          After moving the static final objects to their own class, the test no longer hangs (well, at least 48 out of 48 attempts).

          Show
          ecn Eric Newton added a comment - After moving the static final objects to their own class, the test no longer hangs (well, at least 48 out of 48 attempts).
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4e7251601f9b51318d7e5373468dd852f12d0e0c in accumulo's branch refs/heads/master from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=4e72516 ]

          ACCUMULO-2408 organize static final members of abstract class to a concrete implementation

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4e7251601f9b51318d7e5373468dd852f12d0e0c in accumulo's branch refs/heads/master from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=4e72516 ] ACCUMULO-2408 organize static final members of abstract class to a concrete implementation
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 0bd62e390b737d66b2734d3890dbd4f3ca846589 in accumulo's branch refs/heads/master from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=0bd62e3 ]

          ACCUMULO-2408 organize static final members of abstract class to a concrete implementation

          Show
          jira-bot ASF subversion and git services added a comment - Commit 0bd62e390b737d66b2734d3890dbd4f3ca846589 in accumulo's branch refs/heads/master from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=0bd62e3 ] ACCUMULO-2408 organize static final members of abstract class to a concrete implementation
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 57b2e5c7751e2df03b65db6decdb932ba8394802 in accumulo's branch refs/heads/master from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=57b2e5c ]

          ACCUMULO-2408 organize static final members of abstract class to a concrete implementation

          Show
          jira-bot ASF subversion and git services added a comment - Commit 57b2e5c7751e2df03b65db6decdb932ba8394802 in accumulo's branch refs/heads/master from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=57b2e5c ] ACCUMULO-2408 organize static final members of abstract class to a concrete implementation
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4e7251601f9b51318d7e5373468dd852f12d0e0c in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=4e72516 ]

          ACCUMULO-2408 organize static final members of abstract class to a concrete implementation

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4e7251601f9b51318d7e5373468dd852f12d0e0c in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=4e72516 ] ACCUMULO-2408 organize static final members of abstract class to a concrete implementation
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 0bd62e390b737d66b2734d3890dbd4f3ca846589 in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=0bd62e3 ]

          ACCUMULO-2408 organize static final members of abstract class to a concrete implementation

          Show
          jira-bot ASF subversion and git services added a comment - Commit 0bd62e390b737d66b2734d3890dbd4f3ca846589 in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=0bd62e3 ] ACCUMULO-2408 organize static final members of abstract class to a concrete implementation
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 57b2e5c7751e2df03b65db6decdb932ba8394802 in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=57b2e5c ]

          ACCUMULO-2408 organize static final members of abstract class to a concrete implementation

          Show
          jira-bot ASF subversion and git services added a comment - Commit 57b2e5c7751e2df03b65db6decdb932ba8394802 in accumulo's branch refs/heads/1.6.0-SNAPSHOT from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=57b2e5c ] ACCUMULO-2408 organize static final members of abstract class to a concrete implementation
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 0bd62e390b737d66b2734d3890dbd4f3ca846589 in accumulo's branch refs/heads/1.5.1-SNAPSHOT from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=0bd62e3 ]

          ACCUMULO-2408 organize static final members of abstract class to a concrete implementation

          Show
          jira-bot ASF subversion and git services added a comment - Commit 0bd62e390b737d66b2734d3890dbd4f3ca846589 in accumulo's branch refs/heads/1.5.1-SNAPSHOT from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=0bd62e3 ] ACCUMULO-2408 organize static final members of abstract class to a concrete implementation
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 57b2e5c7751e2df03b65db6decdb932ba8394802 in accumulo's branch refs/heads/1.5.1-SNAPSHOT from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=57b2e5c ]

          ACCUMULO-2408 organize static final members of abstract class to a concrete implementation

          Show
          jira-bot ASF subversion and git services added a comment - Commit 57b2e5c7751e2df03b65db6decdb932ba8394802 in accumulo's branch refs/heads/1.5.1-SNAPSHOT from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=57b2e5c ] ACCUMULO-2408 organize static final members of abstract class to a concrete implementation
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 57b2e5c7751e2df03b65db6decdb932ba8394802 in accumulo's branch refs/heads/1.4.5-SNAPSHOT from Eric Newton
          [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=57b2e5c ]

          ACCUMULO-2408 organize static final members of abstract class to a concrete implementation

          Show
          jira-bot ASF subversion and git services added a comment - Commit 57b2e5c7751e2df03b65db6decdb932ba8394802 in accumulo's branch refs/heads/1.4.5-SNAPSHOT from Eric Newton [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=57b2e5c ] ACCUMULO-2408 organize static final members of abstract class to a concrete implementation
          Hide
          ecn Eric Newton added a comment -

          A bit of googling around found this report which I don't fully understand yet, but seems to be related to a deadlock possible by having multiple threads initializing the static initialisers of a class.

          Show
          ecn Eric Newton added a comment - A bit of googling around found this report which I don't fully understand yet, but seems to be related to a deadlock possible by having multiple threads initializing the static initialisers of a class.
          Hide
          ecn Eric Newton added a comment -

          Here's the stuck stack with "jstack -m"

          ----------------- 41937 -----------------
          0x000000351f00b43c      __pthread_cond_wait + 0xcc
          0x00007f62d031891d      _ZN13ObjectMonitor4waitElbP6Thread + 0x9bd
          0x00007f62d00ce23b      _ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread + 0x36b
          0x00007f62d00ce55a      _ZN13instanceKlass10initializeEP6Thread + 0x6a
          0x00007f62d01055f3      _ZN18InterpreterRuntime4_newEP10JavaThreadP19constantPoolOopDesci + 0x153
          0x00007f62cc1b0181      * org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.startMultiScan(org.apache.accumulo.trace.thrift.TInfo, org.apache.accumulo.core.security.thrift.TCredentials, java.util.Map, java.util.List, java.util.List, java.util.Map, java.util.List, boolean) bci:221 line:1355 (Interpreted frame)
          0x00007f62cc1924e7      <StubRoutines>
          0x00007f62d010d465      _ZN9JavaCalls11call_helperEP9JavaValueP12methodHandleP17JavaCallArgumentsP6Thread + 0x365
          0x00007f62d010bec8      _ZN9JavaCalls4callEP9JavaValue12methodHandleP17JavaCallArgumentsP6Thread + 0x28
          0x00007f62d039e20f      _ZN10Reflection6invokeE19instanceKlassHandle12methodHandle6Handleb14objArrayHandle9BasicTypeS3_bP6Thread + 0x47f
          0x00007f62d039efc0      _ZN10Reflection13invoke_methodEP7oopDesc6Handle14objArrayHandleP6Thread + 0x160
          0x00007f62d0194af4      JVM_InvokeMethod + 0x224
          0x00007f62cc1a4738      * sun.reflect.NativeMethodAccessorImpl.invoke0(java.lang.reflect.Method, java.lang.Object, java.lang.Object[]) bci:0 (Interpreted frame)
          0x00007f62cc198233      * sun.reflect.NativeMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) bci:87 line:57 (Interpreted frame)
          0x00007f62cc198233      * sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) bci:6 line:43 (Interpreted frame)
          0x00007f62cc1988e1      * java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) bci:57 line:606 (Interpreted frame)
          0x00007f62cc198233      * org.apache.accumulo.trace.instrument.thrift.TraceWrap$1.invoke(java.lang.Object, java.lang.reflect.Method, java.lang.Object[]) bci:64 line:63 (Interpreted frame)
          0x00007f62cc1988e1      * com.sun.proxy.$Proxy9.startMultiScan(org.apache.accumulo.trace.thrift.TInfo, org.apache.accumulo.core.security.thrift.TCredentials, java.util.Map, java.util.List, java.util.List, java.util.Map, java.util.List, boolean) bci:55 (Interpreted frame)
          0x00007f62cc1988e1      * org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Iface, org.apache.accumulo.core.tabletserver.thrift.TabletClientService$startMultiScan_args) bci:42 line:2252 (Interpreted frame)
          0x00007f62cc198233      * org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(java.lang.Object, org.apache.thrift.TBase) bci:9 line:2236 (Interpreted frame)
          0x00007f62cc198233      * org.apache.thrift.ProcessFunction.process(int, org.apache.thrift.protocol.TProtocol, org.apache.thrift.protocol.TProtocol, java.lang.Object) bci:86 line:39 (Interpreted frame)
          0x00007f62cc198058      * org.apache.thrift.TBaseProcessor.process(org.apache.thrift.protocol.TProtocol, org.apache.thrift.protocol.TProtocol) bci:126 line:39 (Interpreted frame)
          0x00007f62cc1989fe      * org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(org.apache.thrift.protocol.TProtocol, org.apache.thrift.protocol.TProtocol) bci:37 line:171 (Interpreted frame)
          0x00007f62cc1989fe      * org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke() bci:49 line:478 (Interpreted frame)
          0x00007f62cc198058      * org.apache.accumulo.server.util.TServerUtils$THsHaServer$Invocation.run() bci:74 line:231 (Interpreted frame)
          0x00007f62cc198706      * java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) bci:95 line:1145 (Interpreted frame)
          0x00007f62cc198058      * java.util.concurrent.ThreadPoolExecutor$Worker.run() bci:5 line:615 (Interpreted frame)
          0x00007f62cc198706      * org.apache.accumulo.trace.instrument.TraceRunnable.run() bci:51 line:47 (Interpreted frame)
          0x00007f62cc198706      * org.apache.accumulo.core.util.LoggingRunnable.run() bci:4 line:34 (Interpreted frame)
          0x00007f62cc198706      * java.lang.Thread.run() bci:11 line:744 (Interpreted frame)
          0x00007f62cc1924e7      <StubRoutines>
          0x00007f62d010d465      _ZN9JavaCalls11call_helperEP9JavaValueP12methodHandleP17JavaCallArgumentsP6Thread + 0x365
          0x00007f62d010bec8      _ZN9JavaCalls4callEP9JavaValue12methodHandleP17JavaCallArgumentsP6Thread + 0x28
          0x00007f62d010c197      _ZN9JavaCalls12call_virtualEP9JavaValue11KlassHandleP6SymbolS4_P17JavaCallArgumentsP6Thread + 0x197
          0x00007f62d010c2b7      _ZN9JavaCalls12call_virtualEP9JavaValue6Handle11KlassHandleP6SymbolS5_P6Thread + 0x47
          0x00007f62d01881c5      _ZL12thread_entryP10JavaThreadP6Thread + 0xe5
          0x00007f62d04625ff      _ZN10JavaThread17thread_main_innerEv + 0xdf
          0x00007f62d0462705      _ZN10JavaThread3runEv + 0xf5
          0x00007f62d032a538      _ZL10java_startP6Thread + 0x108
          

          Reproduced on jdk7.0u51.

          Show
          ecn Eric Newton added a comment - Here's the stuck stack with "jstack -m" ----------------- 41937 ----------------- 0x000000351f00b43c __pthread_cond_wait + 0xcc 0x00007f62d031891d _ZN13ObjectMonitor4waitElbP6Thread + 0x9bd 0x00007f62d00ce23b _ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread + 0x36b 0x00007f62d00ce55a _ZN13instanceKlass10initializeEP6Thread + 0x6a 0x00007f62d01055f3 _ZN18InterpreterRuntime4_newEP10JavaThreadP19constantPoolOopDesci + 0x153 0x00007f62cc1b0181 * org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.startMultiScan(org.apache.accumulo.trace.thrift.TInfo, org.apache.accumulo.core.security.thrift.TCredentials, java.util.Map, java.util.List, java.util.List, java.util.Map, java.util.List, boolean) bci:221 line:1355 (Interpreted frame) 0x00007f62cc1924e7 <StubRoutines> 0x00007f62d010d465 _ZN9JavaCalls11call_helperEP9JavaValueP12methodHandleP17JavaCallArgumentsP6Thread + 0x365 0x00007f62d010bec8 _ZN9JavaCalls4callEP9JavaValue12methodHandleP17JavaCallArgumentsP6Thread + 0x28 0x00007f62d039e20f _ZN10Reflection6invokeE19instanceKlassHandle12methodHandle6Handleb14objArrayHandle9BasicTypeS3_bP6Thread + 0x47f 0x00007f62d039efc0 _ZN10Reflection13invoke_methodEP7oopDesc6Handle14objArrayHandleP6Thread + 0x160 0x00007f62d0194af4 JVM_InvokeMethod + 0x224 0x00007f62cc1a4738 * sun.reflect.NativeMethodAccessorImpl.invoke0(java.lang.reflect.Method, java.lang.Object, java.lang.Object[]) bci:0 (Interpreted frame) 0x00007f62cc198233 * sun.reflect.NativeMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) bci:87 line:57 (Interpreted frame) 0x00007f62cc198233 * sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) bci:6 line:43 (Interpreted frame) 0x00007f62cc1988e1 * java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) bci:57 line:606 (Interpreted frame) 0x00007f62cc198233 * org.apache.accumulo.trace.instrument.thrift.TraceWrap$1.invoke(java.lang.Object, java.lang.reflect.Method, java.lang.Object[]) bci:64 line:63 (Interpreted frame) 0x00007f62cc1988e1 * com.sun.proxy.$Proxy9.startMultiScan(org.apache.accumulo.trace.thrift.TInfo, org.apache.accumulo.core.security.thrift.TCredentials, java.util.Map, java.util.List, java.util.List, java.util.Map, java.util.List, boolean) bci:55 (Interpreted frame) 0x00007f62cc1988e1 * org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Iface, org.apache.accumulo.core.tabletserver.thrift.TabletClientService$startMultiScan_args) bci:42 line:2252 (Interpreted frame) 0x00007f62cc198233 * org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(java.lang.Object, org.apache.thrift.TBase) bci:9 line:2236 (Interpreted frame) 0x00007f62cc198233 * org.apache.thrift.ProcessFunction.process(int, org.apache.thrift.protocol.TProtocol, org.apache.thrift.protocol.TProtocol, java.lang.Object) bci:86 line:39 (Interpreted frame) 0x00007f62cc198058 * org.apache.thrift.TBaseProcessor.process(org.apache.thrift.protocol.TProtocol, org.apache.thrift.protocol.TProtocol) bci:126 line:39 (Interpreted frame) 0x00007f62cc1989fe * org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(org.apache.thrift.protocol.TProtocol, org.apache.thrift.protocol.TProtocol) bci:37 line:171 (Interpreted frame) 0x00007f62cc1989fe * org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke() bci:49 line:478 (Interpreted frame) 0x00007f62cc198058 * org.apache.accumulo.server.util.TServerUtils$THsHaServer$Invocation.run() bci:74 line:231 (Interpreted frame) 0x00007f62cc198706 * java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) bci:95 line:1145 (Interpreted frame) 0x00007f62cc198058 * java.util.concurrent.ThreadPoolExecutor$Worker.run() bci:5 line:615 (Interpreted frame) 0x00007f62cc198706 * org.apache.accumulo.trace.instrument.TraceRunnable.run() bci:51 line:47 (Interpreted frame) 0x00007f62cc198706 * org.apache.accumulo.core.util.LoggingRunnable.run() bci:4 line:34 (Interpreted frame) 0x00007f62cc198706 * java.lang.Thread.run() bci:11 line:744 (Interpreted frame) 0x00007f62cc1924e7 <StubRoutines> 0x00007f62d010d465 _ZN9JavaCalls11call_helperEP9JavaValueP12methodHandleP17JavaCallArgumentsP6Thread + 0x365 0x00007f62d010bec8 _ZN9JavaCalls4callEP9JavaValue12methodHandleP17JavaCallArgumentsP6Thread + 0x28 0x00007f62d010c197 _ZN9JavaCalls12call_virtualEP9JavaValue11KlassHandleP6SymbolS4_P17JavaCallArgumentsP6Thread + 0x197 0x00007f62d010c2b7 _ZN9JavaCalls12call_virtualEP9JavaValue6Handle11KlassHandleP6SymbolS5_P6Thread + 0x47 0x00007f62d01881c5 _ZL12thread_entryP10JavaThreadP6Thread + 0xe5 0x00007f62d04625ff _ZN10JavaThread17thread_main_innerEv + 0xdf 0x00007f62d0462705 _ZN10JavaThread3runEv + 0xf5 0x00007f62d032a538 _ZL10java_startP6Thread + 0x108 Reproduced on jdk7.0u51.
          Hide
          ecn Eric Newton added a comment -

          Seems to be the same issue as ACCUMULO-1861.

          Show
          ecn Eric Newton added a comment - Seems to be the same issue as ACCUMULO-1861 .
          Hide
          ecn Eric Newton added a comment -

          Managed to reproduce this after a dozen runs. The multiscan request is stuck here:

          "ClientPool 11" daemon prio=10 tid=0x00007f9c14011800 nid=0xf77c in Object.wait() [0x00007f9bf79f7000]
             java.lang.Thread.State: RUNNABLE
                  at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.startMultiScan(TabletServer.java:1355)
                  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                  at java.lang.reflect.Method.invoke(Method.java:597)
                  at org.apache.accumulo.trace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:63)
                  at $Proxy9.startMultiScan(Unknown Source)
                  at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(TabletClientService.java:2252)
                  at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(TabletClientService.java:2236)
                  at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
                  at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
                  at org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:171)
                  at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:478)
                  at org.apache.accumulo.server.util.TServerUtils$THsHaServer$Invocation.run(TServerUtils.java:231)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                  at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
                  at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
                  at java.lang.Thread.run(Thread.java:662)
          

          Which corresponds to this line:

                Map<KeyExtent,List<Range>> batch = Translator.translate(tbatch, new TKeyExtentTranslator(), new Translator.ListTranslator<TRange,Range>(
                    new TRangeTranslator()));
          
          Show
          ecn Eric Newton added a comment - Managed to reproduce this after a dozen runs. The multiscan request is stuck here: "ClientPool 11" daemon prio=10 tid=0x00007f9c14011800 nid=0xf77c in Object.wait() [0x00007f9bf79f7000] java.lang.Thread.State: RUNNABLE at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.startMultiScan(TabletServer.java:1355) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.accumulo.trace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:63) at $Proxy9.startMultiScan(Unknown Source) at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(TabletClientService.java:2252) at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(TabletClientService.java:2236) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:171) at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:478) at org.apache.accumulo.server.util.TServerUtils$THsHaServer$Invocation.run(TServerUtils.java:231) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47) at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34) at java.lang.Thread.run(Thread.java:662) Which corresponds to this line: Map<KeyExtent,List<Range>> batch = Translator.translate(tbatch, new TKeyExtentTranslator(), new Translator.ListTranslator<TRange,Range>( new TRangeTranslator()));
          Hide
          ecn Eric Newton added a comment -

          The tablet server never reported a successful MultiScan.

          Show
          ecn Eric Newton added a comment - The tablet server never reported a successful MultiScan.
          Hide
          ecn Eric Newton added a comment -

          Failure was against hadoop 1.2.:

          mvn -Dhadoop.profile=1 -Dhadoop.version=1.2.1 clean verify
          
          Show
          ecn Eric Newton added a comment - Failure was against hadoop 1.2.: mvn -Dhadoop.profile=1 -Dhadoop.version=1.2.1 clean verify

            People

            • Assignee:
              ecn Eric Newton
              Reporter:
              ecn Eric Newton
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development