[HADOOP-3707] Frequent DiskOutOfSpaceException on almost-full datanodes - ASF JIRA

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.17.0
Fix Version/s: 0.17.2
Component/s: None
Labels:
None

Release Note:
NameNode keeps a count of number of blocks scheduled to be written to a datanode and uses it to avoid allocating more blocks than a datanode can hold.

Description

On a datanode which is completely full (leaving reserve space), we frequently see

target node reporting,

2008-07-07 16:54:44,707 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_3328886742742952100 src: /11.1.11.111:22222 dest: /11.1.11.111:22222
2008-07-07 16:54:44,708 INFO org.apache.hadoop.dfs.DataNode: writeBlock blk_3328886742742952100 received exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Insufficient space for an additional block
2008-07-07 16:54:44,708 ERROR org.apache.hadoop.dfs.DataNode: 33.3.33.33:22222:DataXceiver: org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Insufficient space for an additional block
        at org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getNextVolume(FSDataset.java:444)
        at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:716)
        at org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:2187)
        at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1113)
        at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:976)
        at java.lang.Thread.run(Thread.java:619)

Sender reporting

2008-07-07 16:54:44,712 INFO org.apache.hadoop.dfs.DataNode: 11.1.11.111:22222:Exception writing block blk_3328886742742952100 to mirror 33.3.33.33:22222
java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcher.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
        at sun.nio.ch.IOUtil.write(IOUtil.java:75)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:53)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2292)
        at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2411)
        at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2476)
        at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1204)
        at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:976)
        at java.lang.Thread.run(Thread.java:619)

Since it's not constantly happening, my guess is whenever datanode gets some small space available, namenode over-assigns blocks which can fail the block
pipeline.
(Note, before 0.17, namenode was much slower in assigning blocks)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-3707-branch-017.patch
10/Jul/08 23:57
8 kB
Raghu Angadi
HADOOP-3707-branch-017.patch
09/Jul/08 19:13
8 kB
Raghu Angadi
HADOOP-3707-branch-017.patch
08/Jul/08 23:55
5 kB
Raghu Angadi
HADOOP-3707-branch-018.patch
15/Jul/08 00:44
8 kB
Raghu Angadi
HADOOP-3707-trunk.patch
15/Jul/08 00:48
8 kB
Raghu Angadi
HADOOP-3707-trunk.patch
14/Jul/08 22:01
8 kB
Raghu Angadi
HADOOP-3707-trunk.patch
10/Jul/08 19:08
8 kB
Raghu Angadi
HADOOP-3707-trunk.patch
10/Jul/08 03:51
5 kB
Raghu Angadi
HADOOP-3707-trunk.patch
09/Jul/08 22:44
8 kB
Raghu Angadi

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Raghu Angadi

Reporter:: Koji Noguchi

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 07/Jul/08 17:21

Updated:: 08/Jul/09 16:43

Resolved:: 16/Jul/08 19:15

Agile

View on Board

Frequent DiskOutOfSpaceException on almost-full datanodes

Details

Description

Attachments

Attachments

Activity

People

Dates

Agile

Slack

Issue deployment