Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15646 Track failing tests in HDFS
  3. HDFS-4723

Occasional failure in TestDFSClientRetries#testGetFileChecksum because the number of available xcievers is set too low

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Cannot Reproduce
    • 2.0.4-alpha, 3.0.0-alpha1
    • None
    • test
    • None
    • Hide
      I cannot reproduce the same stack trace. HDFS-15461 investigates addresses the same unit test with a different stack trace.
      Show
      I cannot reproduce the same stack trace. HDFS-15461 investigates addresses the same unit test with a different stack trace.

    Description

      Occasional failure in TestDFSClientRetries#testGetFileChecksum because the number of available xcievers is set too low.

      2013-04-21 18:48:28,273 WARN  datanode.DataNode (DataXceiverServer.java:run(161)) - 127.0.0.1:37608:DataXceiverServer: 
      java.io.IOException: Xceiver count 3 exceeds the limit of concurrent xcievers: 2
      	at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:143)
      	at java.lang.Thread.run(Thread.java:662)
      2013-04-21 18:48:28,274 INFO  datanode.DataNode (DataXceiver.java:writeBlock(453)) - Datanode 2 got response for connect ack  from downstream datanode with firstbadlink as 127.0.0.1:37608
      2013-04-21 18:48:28,276 INFO  datanode.DataNode (DataXceiver.java:writeBlock(491)) - Datanode 2 forwarding connect ack to upstream firstbadlink is 127.0.0.1:37608
      2013-04-21 18:48:28,276 ERROR datanode.DataNode (DataXceiver.java:writeBlock(477)) - DataNode{data=FSDataset{dirpath='[/home/ec2-user/jenkins/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/current, /home/ec2-user/jenkins/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data4/current]'}, localName='127.0.0.1:33298', storageID='DS-1506063529-10.174.86.97-33298-1366570107286', xmitsInProgress=0}:Exception transfering block BP-2121022065-10.174.86.97-1366570107029:blk_6876843860808656778_1071 to mirror 127.0.0.1:37608: java.io.EOFException: Premature EOF: no length prefix available
      2013-04-21 18:48:28,276 INFO  hdfs.DFSClient (DFSOutputStream.java:createBlockOutputStream(1105)) - Exception in createBlockOutputStream
      java.io.IOException: Bad connect ack with firstBadLink as 127.0.0.1:37608
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1096)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1019)
      	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:464)
      2013-04-21 18:48:28,276 INFO  datanode.DataNode (DataXceiver.java:writeBlock(537)) - opWriteBlock BP-2121022065-10.174.86.97-1366570107029:blk_6876843860808656778_1071 received exception java.io.EOFException: Premature EOF: no length prefix available
      2013-04-21 18:48:28,277 INFO  datanode.DataNode (BlockReceiver.java:receiveBlock(674)) - Exception for BP-2121022065-10.174.86.97-1366570107029:blk_6876843860808656778_1071
      java.io.IOException: Premature EOF from inputStream
      	at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
      	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
      	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:644)
      	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:506)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)
      	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
      	at java.lang.Thread.run(Thread.java:662)
      2013-04-21 18:48:28,277 INFO  hdfs.DFSClient (DFSOutputStream.java:nextBlockOutputStream(1022)) - Abandoning BP-2121022065-10.174.86.97-1366570107029:blk_6876843860808656778_1071
      2013-04-21 18:48:28,277 ERROR datanode.DataNode (DataXceiver.java:run(223)) - 127.0.0.1:33298:DataXceiver error processing WRITE_BLOCK operation  src: /127.0.0.1:55182 dest: /127.0.0.1:33298
      java.io.EOFException: Premature EOF: no length prefix available
      	at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1340)
      	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:448)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)
      	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
      	at java.lang.Thread.run(Thread.java:662)
      2013-04-21 18:48:28,277 INFO  datanode.DataNode (BlockReceiver.java:run(950)) - PacketResponder: BP-2121022065-10.174.86.97-1366570107029:blk_6876843860808656778_1071, type=HAS_DOWNSTREAM_IN_PIPELINE
      java.io.EOFException: Premature EOF: no length prefix available
      	at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1340)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:116)
      	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:894)
      	at java.lang.Thread.run(Thread.java:662)
      2013-04-21 18:48:28,278 INFO  datanode.DataNode (BlockReceiver.java:run(962)) - PacketResponder: BP-2121022065-10.174.86.97-1366570107029:blk_6876843860808656778_1071, type=HAS_DOWNSTREAM_IN_PIPELINE: Thread is interrupted.
      2013-04-21 18:48:28,278 INFO  datanode.DataNode (BlockReceiver.java:run(1043)) - PacketResponder: BP-2121022065-10.174.86.97-1366570107029:blk_6876843860808656778_1071, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
      2013-04-21 18:48:28,278 INFO  datanode.DataNode (DataXceiver.java:writeBlock(537)) - opWriteBlock BP-2121022065-10.174.86.97-1366570107029:blk_6876843860808656778_1071 received exception java.io.IOException: Premature EOF from inputStream
      2013-04-21 18:48:28,278 ERROR datanode.DataNode (DataXceiver.java:run(223)) - 127.0.0.1:58102:DataXceiver error processing WRITE_BLOCK operation  src: /127.0.0.1:47124 dest: /127.0.0.1:58102
      java.io.IOException: Premature EOF from inputStream
      	at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
      	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
      	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:644)
      	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:506)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
      	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)
      	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
      	at java.lang.Thread.run(Thread.java:662)
      2013-04-21 18:48:28,279 INFO  hdfs.DFSClient (DFSOutputStream.java:nextBlockOutputStream(1025)) - Excluding datanode 127.0.0.1:37608
      

      As a consequence of this failure one datanode has been excluded and from this point there are insufficient datanodes to place replicas:

      2013-04-21 18:48:54,288 WARN  blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseTarget(232)) - Not able to place enough replicas, still in need of 1 to reach 3
      
      ...
      

      and the test eventually times out.

      Attachments

        1. 4723.patch
          1 kB
          Andrew Kyle Purtell
        2. 4723-branch-2.patch
          1 kB
          Andrew Kyle Purtell

        Issue Links

          Activity

            People

              Unassigned Unassigned
              apurtell Andrew Kyle Purtell
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: