Issue Details (XML | Word | Printable)

Key: HADOOP-4910
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Hairong Kuang
Reporter: Hairong Kuang
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

NameNode should exclude corrupt replicas when choosing excessive replicas to delete

Created: 18/Dec/08 12:47 AM   Updated: 08/Jul/09 04:43 PM
Return to search
Component/s: None
Affects Version/s: 0.17.0
Fix Version/s: 0.18.3

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works overReplicated.patch 2009-01-02 09:10 PM Hairong Kuang 1 kB
Text File Licensed for inclusion in ASF works overReplicated1.patch 2009-01-06 06:56 PM Hairong Kuang 6 kB
Text File Licensed for inclusion in ASF works overReplicated2-br18.patch 2009-01-08 12:57 AM Hairong Kuang 6 kB
Text File Licensed for inclusion in ASF works overReplicated2.patch 2009-01-08 12:58 AM Hairong Kuang 6 kB

Hadoop Flags: Reviewed
Resolution Date: 08/Jan/09 06:38 PM


 Description  « Hide
Currently, when NameNode handles an over-replicated block in FSNamesystem#processOverReplicatedBlock, it excludes ones already in excessReplicateMap and decommissed ones, but it treats a corrupt replica as a valid one. This may lead to unnecessary deletion of more replicas and thus cause data lose. It should exclude corrupt replicas as well.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Zheng Shao added a comment - 18/Dec/08 12:55 AM
Is this only in 0.17.0, or later versions as well?

Hairong Kuang added a comment - 02/Jan/09 09:10 PM
I am still working on a junit testcase, but attaching the fix first.

Yes, this bug affects 0.17 as well as later releases.


Hairong Kuang added a comment - 06/Jan/09 06:57 PM
This patch has a unit test. Without the fix, the unit test will fail, showing that the block gets lost.

Raghu Angadi added a comment - 07/Jan/09 06:27 PM
+1 patch looks good. Simple but important fix.
minor nit : I would suggest removing reference to this jira in comments. Otherwise it gives an impression that reader should read this jira to understand what is happening, which is not necessary.

Hairong Kuang added a comment - 07/Jan/09 11:26 PM
This is a patch for 0.18. Incorporated Raghu's comment.

Hairong Kuang added a comment - 08/Jan/09 12:58 AM
Patch for the trunk.

Hairong Kuang added a comment - 08/Jan/09 06:18 PM
Ant test-patch passed
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 9 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
[exec]
Ant test-core failed with a known junit test failure reported at HADOOP-4907. All other unit tests passed.

Hairong Kuang added a comment - 08/Jan/09 06:38 PM
I've just committed this.