Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-3148

TabletServer didn't get Session expired in HalfDeadTServerIT

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.6.1, 1.7.0
    • test
    • None

    Description

      Beening seeing spurious failures with HalfDeadTServerIT where it doesn't get the ZK session expiration

      2014-09-15 09:39:59,201 [tserver.TabletServer] DEBUG: ScanSess tid 172.31.33.94:35957 !0 0 entries in 0.07 secs, nbTimes = [63 63 63.00 1] 
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      sleeping
      2014-09-15 09:40:20,088 [tserver.TabletServer] FATAL: Lost tablet server lock (reason = LOCK_DELETED), exiting.
      2014-09-15 09:40:20,088 [zookeeper.ZooCache] WARN : Zookeeper error, will retry
      org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /accumulo/d0b9b8e7-9869-4b00-9ae7-317f5231f2c1/tables/1/conf/table.iterator.minc.vers.opt.maxVersions
      	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
      	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
      	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
      	at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:261)
      	at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:153)
      	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:277)
      	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:224)
      	at org.apache.accumulo.server.conf.ZooCachePropertyAccessor.get(ZooCachePropertyAccessor.java:114)
      	at org.apache.accumulo.server.conf.ZooCachePropertyAccessor.getProperties(ZooCachePropertyAccessor.java:144)
      	at org.apache.accumulo.server.conf.TableConfiguration.getProperties(TableConfiguration.java:108)
      	at org.apache.accumulo.core.conf.AccumuloConfiguration.iterator(AccumuloConfiguration.java:69)
      	at org.apache.accumulo.core.conf.ConfigSanityCheck.validate(ConfigSanityCheck.java:40)
      	at org.apache.accumulo.server.conf.ServerConfigurationFactory.getTableConfiguration(ServerConfigurationFactory.java:155)
      	at org.apache.accumulo.server.conf.ServerConfiguration.getTableConfiguration(ServerConfiguration.java:69)
      	at org.apache.accumulo.tserver.TabletServer.getTableConfiguration(TabletServer.java:3983)
      	at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1277)
      	at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1256)
      	at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1112)
      	at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1089)
      	at org.apache.accumulo.tserver.TabletServer$AssignmentHandler.run(TabletServer.java:2935)
      	at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
      	at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
      	at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
      	at java.lang.Thread.run(Thread.java:745)
      2014-09-15 09:40:20,090 [tserver.TabletServer] WARN : Check for long GC pauses not called in a timely fashion. Expected every 5.0 seconds but was 16.3 seconds since last check
      2014-09-15 09:40:20,477 [datanode.DataNode] ERROR: 127.0.0.1:57185:DataXceiver error processing WRITE_BLOCK operation  src: /127.0.0.1:42146 dst: /127.0.0.1:57185
      java.io.IOException: Premature EOF from inputStream
      	at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
      	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:467)
      	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:771)
      	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:718)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
      	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
      	at java.lang.Thread.run(Thread.java:745)
      

      It looks like the tserver killed itself after the connection loss but before the tserver retried to connect and got the session expiration.

      Attachments

        Activity

          People

            elserj Josh Elser
            elserj Josh Elser
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m