Issue Details (XML | Word | Printable)

Key: HADOOP-641
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Konstantin Shvachko
Reporter: Konstantin Shvachko
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Name-node should demand a block report from resurrected data-nodes.

Created: 26/Oct/06 01:06 AM   Updated: 03/Nov/06 10:40 PM
Return to search
Component/s: None
Affects Version/s: 0.1.0, 0.7.2
Fix Version/s: 0.8.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works ResurrectDN.patch 2006-10-26 01:38 AM Konstantin Shvachko 12 kB
Issue Links:
Reference
 

Resolution Date: 26/Oct/06 08:22 PM


 Description  « Hide
1. This bug contributed to the crash discussed in HADOOP-572.
The problem is that when the name-node is busy, and is not able to process all requests from its clients,
it can consider one of data-nodes dead and discard its blocks sending them into the neededRelications list.
When it finally gets the heartbeat from this data-node it resurrects the node, but not the data-node blocks,
and hence continues to replicate them.
Of course, eventually the name-node will receive the block report from this data-node, but it could take up
to 1 hour. During this time it proceeds with unnecessary block replications, which could be avoided if the
data-node sent its block report right after the resurrection.

I modified code so that the name-node requests block report if it receives a heartbeat from a dead data-node.
I introduced a new command type in the BlockCommand class.
I replaced multiple boolean indicators of the command types by one enum field.
I changed the DatanodeProtocol version.

2. This patch also includes a fix for the data-node registration. If a data-nodes times out during registration
it silently exits, which is hard to notice with a large number of nodes. This patch places registration in a loop,
so that it could retry.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Repository Revision Date User Message
ASF #468115 Thu Oct 26 20:22:36 UTC 2006 cutting HADOOP-641. Change NameNoide to request a fresh block report from re-discovered DataNodes. Contributed by Konstantin.
Files Changed
MODIFY /lucene/hadoop/trunk/src/java/org/apache/hadoop/dfs/BlockCommand.java
MODIFY /lucene/hadoop/trunk/src/java/org/apache/hadoop/dfs/NameNode.java
MODIFY /lucene/hadoop/trunk/src/java/org/apache/hadoop/dfs/DataNode.java
MODIFY /lucene/hadoop/trunk/src/java/org/apache/hadoop/dfs/FSNamesystem.java
MODIFY /lucene/hadoop/trunk/CHANGES.txt
MODIFY /lucene/hadoop/trunk/src/java/org/apache/hadoop/dfs/DatanodeProtocol.java

Repository Revision Date User Message
ASF #817884 Tue Sep 22 22:59:25 UTC 2009 omalley HADOOP-641. Moving block forensics over from hdfs.
Files Changed
ADD /hadoop/mapreduce/branches/HDFS-641/src/contrib/block_forensics (from /hadoop/hdfs/branches/branch-0.21/src/contrib/block_forensics)