[HDFS-14626] Decommission all nodes hosting last block of open file succeeds unexpectedly - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.3.0
Fix Version/s: None
Component/s: None
Labels:
None

Target Version/s:

3.5.0

Description

I have been investigating scenarios that cause decommission to hang, especially around one long standing issue. That is, an open block on the host which is being decommissioned can cause the process to never complete.

Checking the history, there seems to have been at least one change in ~~HDFS-5579~~ which greatly improved the situation, but from reading comments and support cases, there still seems to be some scenarios where open blocks on a DN host cause the decommission to get stuck.

No matter what I try, I have not been able to reproduce this, but I think I have uncovered another issue that may partly explain why.

If I do the following, the nodes will decommission without any issues:

1. Create a file and write to it so it crosses a block boundary. Then there is one complete block and one under construction block. Keep the file open, and write a few bytes periodically.

2. Now note the nodes which the UC block is currently being written on, and decommission them all.

3. The decommission should succeed.

4. Now attempt to close the open file, and it will fail to close with an error like below, probably as decommissioned nodes are not allowed to send IBRs:

java.io.IOException: Unable to close file because the last block BP-646926902-192.168.0.20-1562099323291:blk_1073741827_1003 does not have enough number of replicas.
    at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:968)
    at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:911)
    at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:894)
    at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:849)
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
    at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)

Interestingly, if you recommission the nodes without restarting them before closing the file, it will close OK, and writes to it can continue even once decommission has completed.

I don't think this is expected - ie decommission should not complete on all nodes hosting the last UC block of a file?

From what I have figured out, I don't think UC blocks are considered in the DatanodeAdminManager at all. This is because the original list of blocks it cares about, are taken from the Datanode block Iterator, which takes them from the DatanodeStorageInfo objects attached to the datanode instance. I believe UC blocks don't make it into the DatanodeStoreageInfo until after they have been completed and an IBR sent, so the decommission logic never considers them.

What troubles me about this explanation, is how did open files previously cause decommission to get stuck if it never checks for them, so I suspect I am missing something.

I will attach a patch with a test case that demonstrates this issue. This reproduces on trunk and I also tested on CDH 5.8.1, which is based on the 2.6 branch, but with a lot of backports.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

test-to-reproduce.patch
02/Jul/19 21:01
3 kB
Stephen O'Donnell

Issue Links

relates to

HDFS-3599 Better expose when under-construction files are preventing DN decommission

Open

Activity

People

Assignee:: Unassigned

Reporter:: Stephen O'Donnell

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 02/Jul/19 20:54

Updated:: 04/Jan/24 08:15