Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
0.17.0
-
None
-
None
-
NameNode keeps a count of number of blocks scheduled to be written to a datanode and uses it to avoid allocating more blocks than a datanode can hold.
Description
On a datanode which is completely full (leaving reserve space), we frequently see
target node reporting,
2008-07-07 16:54:44,707 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_3328886742742952100 src: /11.1.11.111:22222 dest: /11.1.11.111:22222 2008-07-07 16:54:44,708 INFO org.apache.hadoop.dfs.DataNode: writeBlock blk_3328886742742952100 received exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Insufficient space for an additional block 2008-07-07 16:54:44,708 ERROR org.apache.hadoop.dfs.DataNode: 33.3.33.33:22222:DataXceiver: org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Insufficient space for an additional block at org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getNextVolume(FSDataset.java:444) at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:716) at org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:2187) at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1113) at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:976) at java.lang.Thread.run(Thread.java:619)
Sender reporting
2008-07-07 16:54:44,712 INFO org.apache.hadoop.dfs.DataNode: 11.1.11.111:22222:Exception writing block blk_3328886742742952100 to mirror 33.3.33.33:22222 java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) at sun.nio.ch.IOUtil.write(IOUtil.java:75) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:53) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2292) at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2411) at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2476) at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1204) at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:976) at java.lang.Thread.run(Thread.java:619)
Since it's not constantly happening, my guess is whenever datanode gets some small space available, namenode over-assigns blocks which can fail the block
pipeline.
(Note, before 0.17, namenode was much slower in assigning blocks)