Issue Details (XML | Word | Printable)

Key: HADOOP-3193
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Chris Douglas
Reporter: Robert Chansler
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Discovery of corrupt block reported in name node log

Created: 05/Apr/08 01:09 AM   Updated: 08/Jul/09 04:43 PM
Component/s: None
Affects Version/s: None
Fix Version/s: 0.18.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 3193-0.patch 2008-05-14 08:50 PM Chris Douglas 2 kB
Text File Licensed for inclusion in ASF works 3193-1.patch 2008-06-06 09:27 PM Chris Douglas 2 kB
Text File Licensed for inclusion in ASF works 3193-2.patch 2008-06-06 09:44 PM Chris Douglas 3 kB
Issue Links:
Reference

Hadoop Flags: Reviewed, Incompatible change
Release Note: Added reporter to FSNamesystem stateChangeLog, and a new metric to track the number of corrupted replicas.
Resolution Date: 06/Jun/08 10:08 PM


 Description  « Hide
Any discovery of a corrupt/unreadable block must be reported in the name node log.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
dhruba borthakur added a comment - 07/Apr/08 06:45 AM
When a client discovers a corrupt block, it reports it to the namenode. The Namenode logs a "ReportBadBlock message" it in the namenode log. One improvement would be to enhance the log message to print the blockId(s) as well!

Another improvement would be to report the number of corrupted blocks through the HadoopMetrics API.


dhruba borthakur added a comment - 14/May/08 09:00 PM
I am marking this as an incompatible change, especially because a new HadoopMetric config file needs to be deployed to existing clusters to display "BlocksCorrupted".

Tsz Wo (Nicholas), SZE added a comment - 14/May/08 09:32 PM
What are the cases that a client (non-datanode client) should call reportBadBlocks(...)? I am concerned about the security issue.

Lohit Vijayarenu added a comment - 14/May/08 09:33 PM
+1 patch looks good. One small thing, the metric seem to report number of corrupt blocks reported over time. Should it be changed to number of corrupt blocks in the system at any point of time, possibly using MetricsIntValue. And also, namesystem.markBlockAsCorrupt logs a message about this inside corruptReplicas.addToCorruptReplicasMap function.

Chris Douglas added a comment - 14/May/08 10:54 PM
Lohit is right about the logging; it's redundant since HADOOP-2065.

Canceling this patch until we decide what to do with the metric.


Sameer Paranjpye added a comment - 14/May/08 11:27 PM
I'd like to do more here. In addition to reporting corrupt blocks in the log, the Namenode should try and determine where the corruption occured i.e. on disk on the Datanode vs elsewhere (network transmission or in memory on the client).

Chris Douglas added a comment - 06/Jun/08 09:25 PM
Revised to include Lohit's feedback

Lohit Vijayarenu added a comment - 06/Jun/08 09:31 PM
+1 Patch looks good

Chris Douglas added a comment - 06/Jun/08 09:44 PM
Fixed findbugs warning

Chris Douglas added a comment - 06/Jun/08 09:52 PM - edited
     [exec] -1 overall.  

     [exec]     +1 @author.  The patch does not contain any @author tags.

     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
     [exec]                         Please justify why no tests are needed for this patch.

     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.

     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.

No tests are included, as the change is only to logging and adding a metric.

[ edit - all dfs tests pass on my machine ]


Chris Douglas added a comment - 06/Jun/08 10:01 PM
In the future, it would be helpful if we included not only where the error occured, but more details about the particular error. Created HADOOP-3510 to track this improvement.

Chris Douglas added a comment - 06/Jun/08 10:08 PM
I just committed this.