Issue Details (XML | Word | Printable)

Key: HADOOP-4971
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Raghu Angadi
Reporter: Raghu Angadi
Votes: 0
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Block report times from datanodes could converge to same time.

Created: 31/Dec/08 08:40 PM   Updated: 08/Jul/09 04:43 PM
Return to search
Component/s: None
Affects Version/s: 0.18.0
Fix Version/s: 0.18.3

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-4971-branch-18.patch 2009-01-07 11:42 PM Raghu Angadi 1.0 kB
Text File Licensed for inclusion in ASF works HADOOP-4971.patch 2009-01-07 11:19 PM Raghu Angadi 1 kB
Text File Licensed for inclusion in ASF works HADOOP-4971.patch 2009-01-07 03:27 AM Raghu Angadi 1 kB

Hadoop Flags: Reviewed
Release Note: A long (unexpected) delay at datanodes could make subsequent block reports from many datanode at the same time.
Resolution Date: 08/Jan/09 01:03 AM


 Description  « Hide
Datanode block reports take quite a bit of memory to process at the namenode. After the inital report, DNs pick a random time to spread this load across at the NN. This normally works fine.

Block reports are sent inside "offerService()" thread in DN. If for some reason this thread was stuck for long time (comparable to block report interval), and same thing happens on many DNs, all of them get back to the loop at the same time and start sending block report then and every hour at the same time.

RPC server and clients in 0.18 can handle this situation fine. But since this is a memory intensive RPC it lead to large GC delays at the NN. We don't know yet why offerService therads seemed to be stuck, but DN should re-randomize it block report time in such cases.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
No work has yet been logged on this issue.