[HDFS-15875] Check whether file is being truncated before truncate - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.3.0, 3.1.4, 3.2.2
Fix Version/s: 3.3.1, 3.4.0, 3.2.3
Component/s: datanode, fs, namenode
Labels:
- pull-request-available

Target Version/s:

3.3.1, 3.4.0, 3.2.3
Hadoop Flags:

Reviewed

Description

We have got this problem.

A job sends truncate to namenode, and the block recovery goes.
DataNode D is timeout while it connects another datanode (60s), so block recovery costs 60+s
A job tails, and B job starts and it sends truncate to namenode. New recoveryId generates during recovery lease.
DataNode D commitBlockSynchronization and get errors "does not match current recovery id"

So truncate will not complete forever. Datanode D has replica with new length and two other datanodes have replica old length.

DN has the error messages "Inconsistent size of finalized replicas"

the related code is in BlockRecoveryWorker.java


for (BlockRecord r : syncList) {
 assert r.rInfo.getNumBytes() > 0 : "zero length replica";
 ReplicaState rState = r.rInfo.getOriginalReplicaState();
 if (rState.getValue() < bestState.getValue()) {
 bestState = rState;
 }
 if(rState == ReplicaState.FINALIZED) {
 if (finalizedLength > 0 && finalizedLength != r.rInfo.getNumBytes()) {
 throw new IOException("Inconsistent size of finalized replicas. " +
 "Replica " + r.rInfo + " expected size: " + finalizedLength);
 }
 finalizedLength = r.rInfo.getNumBytes();
 }
}

Attachments

Issue Links

links to

GitHub Pull Request #2746

Activity

People

Assignee:: Hui Fei

Reporter:: Hui Fei

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 05/Mar/21 05:44

Updated:: 16/Oct/24 18:29

Resolved:: 10/Mar/21 06:12

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 50m