Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-3775

Root tablet had 6,974 walogs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 1.8.0
    • None
    • Same as ACCUMULO-3774

    Description

      Before the deadlock described in ACCUMULO-3774, the root tablet recovered 6,974 walogs. Almost all of theses were empty. Before the tserver was killed there were thousands of messages like the following (I think this was caused by datanode agitation).

      2015-05-05 18:02:43,236 [log.TabletServerLogger] INFO : Using next log hdfs://10.1.5.21:10000/accumulo/wal/worker10+9997/a13aee79-c313-4298-b55a-8ec58ffb977c
      2015-05-05 18:02:43,236 [log.TabletServerLogger] DEBUG: Creating next WAL
      2015-05-05 18:02:43,236 [tserver.TabletServer] INFO : Writing log marker for level ROOT hdfs://10.1.5.21:10000/accumulo/wal/worker10+9997/a13aee79-c313-4298-b55a-8ec58ffb977c
      2015-05-05 18:02:43,236 [log.DfsLogger] DEBUG: Address is worker10:9997
      2015-05-05 18:02:43,236 [log.DfsLogger] DEBUG: DfsLogger.open() begin
      2015-05-05 18:02:43,236 [util.MetadataTableUtil] DEBUG: Adding log entry hdfs://10.1.5.21:10000/accumulo/wal/worker10+9997/a13aee79-c313-4298-b55a-8ec58ffb977c
      2015-05-05 18:02:43,237 [fs.VolumeManagerImpl] DEBUG: creating hdfs://10.1.5.21:10000/accumulo/wal/worker10+9997/295244ee-c9e3-404f-a3d8-9569e41ba8e1 with CreateFlag set: [CREATE, SYNC_BLOCK]
      2015-05-05 18:02:43,246 [tserver.TabletServer] INFO : Writing log marker for level NORMAL hdfs://10.1.5.21:10000/accumulo/wal/worker10+9997/a13aee79-c313-4298-b55a-8ec58ffb977c
      2015-05-05 18:02:43,247 [util.MetadataTableUtil] DEBUG: Adding log entry hdfs://10.1.5.21:10000/accumulo/wal/worker10+9997/a13aee79-c313-4298-b55a-8ec58ffb977c
      2015-05-05 18:02:43,247 [log.DfsLogger] DEBUG: No enciphering, using raw output stream
      2015-05-05 18:02:43,247 [log.DfsLogger] DEBUG: Got new write-ahead log: worker10:9997/hdfs://10.1.5.21:10000/accumulo/wal/worker10+9997/295244ee-c9e3-404f-a3d8-9569e41ba8e1
      2015-05-05 18:02:43,250 [hdfs.DFSClient] WARN : DataStreamer Exception
      org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /accumulo/wal/worker10+9997/a13aee79-c313-4298-b55a-8ec58ffb977c could only be replicated to 2 nodes instead of minReplication (=3).  There are 16 datanode(s) running and n
      o node(s) are excluded in this operation.
              at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550)
              at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3067)
              at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:722)
              at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
              at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
      
              at org.apache.hadoop.ipc.Client.call(Client.java:1476)
              at org.apache.hadoop.ipc.Client.call(Client.java:1407)
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
              at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
              at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
              at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:606)
              at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
              at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
              at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
              at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1430)
              at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1226)
              at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
      
      2015-05-05 18:02:43,352 [log.TabletServerLogger] INFO : Using next log hdfs://10.1.5.21:10000/accumulo/wal/worker10+9997/295244ee-c9e3-404f-a3d8-9569e41ba8e1
      2015-05-05 18:02:43,352 [log.TabletServerLogger] DEBUG: Creating next WAL
      2015-05-05 18:02:43,352 [tserver.TabletServer] INFO : Writing log marker for level ROOT hdfs://10.1.5.21:10000/accumulo/wal/worker10+9997/295244ee-c9e3-404f-a3d8-9569e41ba8e1
      2015-05-05 18:02:43,352 [log.DfsLogger] DEBUG: Address is worker10:9997
      2015-05-05 18:02:43,352 [log.DfsLogger] DEBUG: DfsLogger.open() begin
      2015-05-05 18:02:43,353 [util.MetadataTableUtil] DEBUG: Adding log entry hdfs://10.1.5.21:10000/accumulo/wal/worker10+9997/295244ee-c9e3-404f-a3d8-9569e41ba8e1
      2015-05-05 18:02:43,353 [fs.VolumeManagerImpl] DEBUG: creating hdfs://10.1.5.21:10000/accumulo/wal/worker10+9997/1810b018-26e3-4728-bbab-e3d901e3edd3 with CreateFlag set: [CREATE, SYNC_BLOCK]
      2015-05-05 18:02:43,362 [log.DfsLogger] DEBUG: No enciphering, using raw output stream
      2015-05-05 18:02:43,362 [log.DfsLogger] DEBUG: Got new write-ahead log: worker10:9997/hdfs://10.1.5.21:10000/accumulo/wal/worker10+9997/1810b018-26e3-4728-bbab-e3d901e3edd3
      2015-05-05 18:02:43,366 [log.TabletServerLogger] DEBUG: Created next WAL hdfs://10.1.5.21:10000/accumulo/wal/worker10+9997/1810b018-26e3-4728-bbab-e3d901e3edd3
      2015-05-05 18:02:43,366 [hdfs.DFSClient] WARN : DataStreamer Exception
      org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /accumulo/wal/worker10+9997/295244ee-c9e3-404f-a3d8-9569e41ba8e1 could only be replicated to 2 nodes instead of minReplication (=3).  There are 16 datanode(s) running and no node(s) are excluded in this operation.
              at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550)
              at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3067)
              at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:722)
              at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
              at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
              at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
              at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
              at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
      
              at org.apache.hadoop.ipc.Client.call(Client.java:1476)
              at org.apache.hadoop.ipc.Client.call(Client.java:1407)
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
              at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
              at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
              at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:606)
              at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
              at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
              at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
              at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1430)
              at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1226)
              at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
      

      Attachments

        1. ACCUMULO_3775-01.patch
          2 kB
          Eric C. Newton
        2. ACCUMULO_3775_02.patch
          4 kB
          Josh Elser
        3. ACCUMULO-3775-3.patch
          5 kB
          Josh Elser

        Issue Links

          Activity

            People

              ecn Eric C. Newton
              kturner Keith Turner
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m