Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
0.22.0, 0.23.1, 1.0.0, 1.1.0, 2.0.0-alpha
-
None
-
Reviewed
Description
When tested the HA(internal) with continuous switch with some 5mins gap, found some blocks missed and namenode went into safemode after next switch.
After the analysis, i found that this files already deleted by clients. But i don't see any delete commands logs namenode log files. But namenode added that blocks to invalidateSets and DNs deleted the blocks.
When restart of the namenode, it went into safemode and expecting some more blocks to come out of safemode.
Here the reason could be that, file has been deleted in memory and added into invalidates after this it is trying to sync the edits into editlog file. By that time NN asked DNs to delete that blocks. Now namenode shuts down before persisting to editlogs.( log behind)
Due to this reason, we may not get the INFO logs about delete, and when we restart the Namenode (in my scenario it is again switch), Namenode expects this deleted blocks also, as delete request is not persisted into editlog before.
I reproduced this scenario with bedug points. I feel, We should not add the blocks to invalidates before persisting into Editlog.
Note: for switch, we used kill -9 (force kill)
I am currently in 0.20.2 version. Same verified in 0.23 as well in normal crash + restart scenario.
Attachments
Attachments
Issue Links
- is blocked by
-
HDFS-3791 Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes
- Closed
- is broken by
-
HDFS-173 Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes
- Closed
- is related to
-
HDFS-5474 Deletesnapshot can make Namenode in safemode on NN restarts.
- Closed