Hadoop Common
  1. Hadoop Common
  2. HADOOP-3707

Frequent DiskOutOfSpaceException on almost-full datanodes

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.17.0
    • Fix Version/s: 0.17.2
    • Component/s: None
    • Labels:
      None
    • Release Note:
      NameNode keeps a count of number of blocks scheduled to be written to a datanode and uses it to avoid allocating more blocks than a datanode can hold.

      Description

      On a datanode which is completely full (leaving reserve space), we frequently see

      target node reporting,

      2008-07-07 16:54:44,707 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_3328886742742952100 src: /11.1.11.111:22222 dest: /11.1.11.111:22222
      2008-07-07 16:54:44,708 INFO org.apache.hadoop.dfs.DataNode: writeBlock blk_3328886742742952100 received exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Insufficient space for an additional block
      2008-07-07 16:54:44,708 ERROR org.apache.hadoop.dfs.DataNode: 33.3.33.33:22222:DataXceiver: org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Insufficient space for an additional block
              at org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getNextVolume(FSDataset.java:444)
              at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:716)
              at org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:2187)
              at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1113)
              at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:976)
              at java.lang.Thread.run(Thread.java:619)
      

      Sender reporting

      2008-07-07 16:54:44,712 INFO org.apache.hadoop.dfs.DataNode: 11.1.11.111:22222:Exception writing block blk_3328886742742952100 to mirror 33.3.33.33:22222
      java.io.IOException: Broken pipe
              at sun.nio.ch.FileDispatcher.write0(Native Method)
              at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
              at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
              at sun.nio.ch.IOUtil.write(IOUtil.java:75)
              at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
              at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:53)
              at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140)
              at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144)
              at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105)
              at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
              at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
              at java.io.DataOutputStream.write(DataOutputStream.java:90)
              at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2292)
              at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2411)
              at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2476)
              at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1204)
              at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:976)
              at java.lang.Thread.run(Thread.java:619)
      

      Since it's not constantly happening, my guess is whenever datanode gets some small space available, namenode over-assigns blocks which can fail the block
      pipeline.
      (Note, before 0.17, namenode was much slower in assigning blocks)

      1. HADOOP-3707-trunk.patch
        8 kB
        Raghu Angadi
      2. HADOOP-3707-branch-018.patch
        8 kB
        Raghu Angadi
      3. HADOOP-3707-trunk.patch
        8 kB
        Raghu Angadi
      4. HADOOP-3707-branch-017.patch
        8 kB
        Raghu Angadi
      5. HADOOP-3707-trunk.patch
        8 kB
        Raghu Angadi
      6. HADOOP-3707-trunk.patch
        5 kB
        Raghu Angadi
      7. HADOOP-3707-trunk.patch
        8 kB
        Raghu Angadi
      8. HADOOP-3707-branch-017.patch
        8 kB
        Raghu Angadi
      9. HADOOP-3707-branch-017.patch
        5 kB
        Raghu Angadi

        Activity

        Owen O'Malley made changes -
        Component/s dfs [ 12310710 ]
        Owen O'Malley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Owen O'Malley made changes -
        Fix Version/s 0.18.0 [ 12312972 ]
        Fix Version/s 0.19.0 [ 12313211 ]
        Raghu Angadi made changes -
        Release Note NameNode keeps a count of number of blocks scheduled to be written to a datanode and uses it to avoid allocating more blocks than a datanode can hold.
        Fix Version/s 0.17.2 [ 12313296 ]
        Resolution Fixed [ 1 ]
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Raghu Angadi made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Raghu Angadi made changes -
        Attachment HADOOP-3707-trunk.patch [ 12386028 ]
        Raghu Angadi made changes -
        Attachment HADOOP-3707-branch-018.patch [ 12386026 ]
        Raghu Angadi made changes -
        Attachment HADOOP-3707-trunk.patch [ 12386017 ]
        Raghu Angadi made changes -
        Release Note committed this patch to 0.17. I will temporarily remove 0.17 from the 'fix versions' so that it does not appear as an unresolved blocker for 0.17.
        Raghu Angadi made changes -
        Fix Version/s 0.17.2 [ 12313296 ]
        Release Note committed this patch to 0.17. I will temporarily remove 0.17 from the 'fix versions' so that it does not appear as an unresolved blocker for 0.17.
        Raghu Angadi made changes -
        Attachment HADOOP-3707-branch-017.patch [ 12385814 ]
        Raghu Angadi made changes -
        Attachment HADOOP-3707-trunk.patch [ 12385795 ]
        Raghu Angadi made changes -
        Attachment HADOOP-3707-trunk.patch [ 12385698 ]
        Raghu Angadi made changes -
        Attachment HADOOP-3707-trunk.patch [ 12385682 ]
        Raghu Angadi made changes -
        Attachment HADOOP-3707-branch-017.patch [ 12385663 ]
        Raghu Angadi made changes -
        Attachment HADOOP-3707-branch-017.patch [ 12385552 ]
        Raghu Angadi made changes -
        Fix Version/s 0.17.2 [ 12313296 ]
        Fix Version/s 0.18.0 [ 12312972 ]
        Priority Major [ 3 ] Blocker [ 1 ]
        Fix Version/s 0.19.0 [ 12313211 ]
        Raghu Angadi made changes -
        Field Original Value New Value
        Assignee Raghu Angadi [ rangadi ]
        Koji Noguchi created issue -

          People

          • Assignee:
            Raghu Angadi
            Reporter:
            Koji Noguchi
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development