Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-2976

Blocks staying underreplicated (for unclosed file)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.15.3
    • 0.17.0
    • None
    • None

    Description

      We had two files staying underreplicated for over a day.
      I checked that these under-replicated blocks are not corrupted.
      (They were both task tmp files and most likely didn't get closed.)

      Taking one file, /aaa/_task_200803040823_0001_r_000421_0/part-00421

      Namenode log showed

      namenode.log.2008-03-04 2008-03-04 16:19:21,478 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: /aaa/task_200803040823_0001_r_000421_0/part-00421. blk-7848645760735416126
      2008-03-04 16:19:24,357 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 11.1.111.111:22222 is added to blk_-7848645760735416126

      On the datanode 11.1.111.111, it showed

      2008-03-04 16:19:24,358 INFO org.apache.hadoop.dfs.DataNode: Received block blk_-7848645760735416126 from /55.55.55.55 and operation failed at /22.2.222.22

      On the second datanode 22.2.222.22, it showed

      2008-03-04 16:19:21,578 INFO org.apache.hadoop.dfs.DataNode: Exception writing to mirror 33.3.33.33
      java.net.SocketException: Connection reset
      at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
      at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
      at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
      at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
      at java.io.DataOutputStream.write(DataOutputStream.java:90)
      at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:1333)
      at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:1386)
      at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:938)
      at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:804)
      at java.lang.Thread.run(Thread.java:619)

      2008-03-04 16:19:24,358 ERROR org.apache.hadoop.dfs.DataNode: DataXceiver: java.net.SocketException: Broken pipe
      at java.net.SocketOutputStream.socketWrite0(Native Method)
      at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
      at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
      at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
      at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
      at java.io.DataOutputStream.flush(DataOutputStream.java:106)
      at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:1394)
      at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:938)
      at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:804)
      at java.lang.Thread.run(Thread.java:619)

      Attachments

        1. leaseExpiryReplication.patch
          2 kB
          Dhruba Borthakur

        Activity

          People

            dhruba Dhruba Borthakur
            knoguchi Koji Noguchi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: