[HDFS-7815] Loop on 'blocks does not belong to any file' - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.6.0
Fix Version/s: None
Component/s: datanode, namenode
Labels:
None
Environment:

small cluster on RetHat. 2 namenodes (HA), 6 datanodes with 19TB disk for hdfs.

Description

I am currently experincing a looping situation;
The namenode uses appx 1:50 (min:sec) to log a massive amount of lines stating that some blocks don't belong to any file. During this time, it's unresponsive to any requests from datanodes, and if the zoo-keper had been running, it would have taken the name-node down (ssh-fencing : kill).
When it has finished the 'round', it starts to do some normal work, and among other things, telling the datanode to delete the blocks. But before the datanode has gotten around to delete the blocks, and is about to report back to the namenode, the namenode has stared on the next round of reporing the same blocks that don't belong to anly file. Thus, the datanode gets a timout when reporing block-updates for the deleted blocks, And this, of course repeats itself over and over again...

There is actually two issues , I think,;
1- the namenode gets totally unresponsive when reporing the blocks (could this be a debug-line instead of a INFO-line)
2 - the namenode seems to 'forget' that it has already reported those blocks just 2-3 minutes ago...

Attachments

Issue Links

duplicates

HDFS-7503 Namenode restart after large deletions can cause slow processReport (due to logging)

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Frode Halvorsen

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 20/Feb/15 10:03

Updated:: 17/Jan/17 12:45

Resolved:: 20/Feb/15 17:47