Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-2263

Make DFSClient report bad blocks more quickly

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.20.2
    • Fix Version/s: None
    • Component/s: hdfs-client
    • Labels:
      None
    • Target Version/s:

      Description

      In certain circumstances the DFSClient may detect a block as being bad without reporting it promptly to the NN.

      If when reading a file a client finds an invalid checksum of a block, it immediately reports that bad block to the NN. If when serving up a block a DN finds a truncated block, it reports this to the client, but the client merely adds that DN to the list of dead nodes and moves on to trying another DN, without reporting this to the NN.

        Activity

        Aaron T. Myers created issue -
        Hide
        Aaron T. Myers added a comment -

        I've confirmed this issue exists in the 0.20-security branch, and I bet it exists in 0.23 as well but haven't checked yet.

        Show
        Aaron T. Myers added a comment - I've confirmed this issue exists in the 0.20-security branch, and I bet it exists in 0.23 as well but haven't checked yet.
        Harsh J made changes -
        Field Original Value New Value
        Assignee Harsh J [ qwertymaniac ]
        Hide
        Arpit Gupta added a comment -

        Here is the output of the dfs -cat call where it reports the node as dead

        Here is the output of the dfs -cat

        11/09/27 18:06:05 WARN hdfs.DFSClient: Failed to connect to /IP:1019, add to deadNodes and continuejava.io.IOException: Got error for OP_READ_BLOCK, self=/IP:55657, remote=/IP:1019, for file /some_file.txt, for block -607102961416835735_7654

        11/09/27 18:06:05 INFO hdfs.DFSClient: Could not obtain block blk_-607102961416835735_7654 from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry

        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:2093)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1897)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2048)
        at java.io.DataInputStream.read(DataInputStream.java:83)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:68)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100)
        at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
        at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
        at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:349)
        at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1913)
        at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)
        at org.apache.hadoop.fs.FsShell.doall(FsShell.java:1557)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:1776)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)

        And then after this if the fsck call is made it reports the fs as healthy. It should mark the fs as corrupt.

        Show
        Arpit Gupta added a comment - Here is the output of the dfs -cat call where it reports the node as dead Here is the output of the dfs -cat 11/09/27 18:06:05 WARN hdfs.DFSClient: Failed to connect to /IP:1019, add to deadNodes and continuejava.io.IOException: Got error for OP_READ_BLOCK, self=/IP:55657, remote=/IP:1019, for file /some_file.txt, for block -607102961416835735_7654 11/09/27 18:06:05 INFO hdfs.DFSClient: Could not obtain block blk_-607102961416835735_7654 from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:2093) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1897) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2048) at java.io.DataInputStream.read(DataInputStream.java:83) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:68) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100) at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114) at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49) at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:349) at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1913) at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346) at org.apache.hadoop.fs.FsShell.doall(FsShell.java:1557) at org.apache.hadoop.fs.FsShell.run(FsShell.java:1776) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895) And then after this if the fsck call is made it reports the fs as healthy. It should mark the fs as corrupt.
        Hide
        Harsh J added a comment -

        (Issue affects trunk, and attached patch is against that.)

        Aaron/Arpit,

        An error in OP_READ_BLOCK operation can also arise out of xceiver loads apart from truncation of block files and missing / bad-permission block files.

        Attached patch reports for every error encountered, and not just the final tried LocatedBlock. I do know this is wrong, as it'd spark a replication storm for a reason as simple as filled up xceiver loads causing the read error – but let me know if am wrong, and I'll tweak the patches and the tests a bit to accomodate final-retry corrupt marking.

        Show
        Harsh J added a comment - (Issue affects trunk, and attached patch is against that.) Aaron/Arpit, An error in OP_READ_BLOCK operation can also arise out of xceiver loads apart from truncation of block files and missing / bad-permission block files. Attached patch reports for every error encountered, and not just the final tried LocatedBlock. I do know this is wrong, as it'd spark a replication storm for a reason as simple as filled up xceiver loads causing the read error – but let me know if am wrong, and I'll tweak the patches and the tests a bit to accomodate final-retry corrupt marking.
        Harsh J made changes -
        Attachment HDFS-2263.patch [ 12508641 ]
        Harsh J made changes -
        Target Version/s 0.24.0 [ 12317653 ]
        Harsh J made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12508641/HDFS-2263.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated 20 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        -1 release audit. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings).

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hdfs.TestClientReportBadBlock
        org.apache.hadoop.hdfs.TestDFSClientRetries
        org.apache.hadoop.hdfs.TestDatanodeBlockScanner
        org.apache.hadoop.hdfs.TestPread
        org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1740//testReport/
        Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1740//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1740//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1740//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508641/HDFS-2263.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 20 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestClientReportBadBlock org.apache.hadoop.hdfs.TestDFSClientRetries org.apache.hadoop.hdfs.TestDatanodeBlockScanner org.apache.hadoop.hdfs.TestPread org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1740//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1740//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1740//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1740//console This message is automatically generated.
        Hide
        Todd Lipcon added a comment -

        Haven't looked at the patch, but in general we should only report "corrupt" if we have a verifiable case of bad checksum. Other "generic errors" out of OP_READ_BLOCK shouldn't trigger a bad block being reported for the reason Harsh mentioned, even if it's the "final retry" – eg maybe the client got partitioned from the DNs but not the NN. In that case we don't want it going and reporting bad blocks everywhere.

        Show
        Todd Lipcon added a comment - Haven't looked at the patch, but in general we should only report "corrupt" if we have a verifiable case of bad checksum. Other "generic errors" out of OP_READ_BLOCK shouldn't trigger a bad block being reported for the reason Harsh mentioned, even if it's the "final retry" – eg maybe the client got partitioned from the DNs but not the NN. In that case we don't want it going and reporting bad blocks everywhere.
        Hide
        Harsh J added a comment -

        Understood Todd. I believe the tests in TestClientReportBadBlock do explain this too – I had not noticed that test before attempting this.

        Aaron - Can we resolve this as a won't fix?

        Show
        Harsh J added a comment - Understood Todd. I believe the tests in TestClientReportBadBlock do explain this too – I had not noticed that test before attempting this. Aaron - Can we resolve this as a won't fix?
        Hide
        Aaron T. Myers added a comment -

        Can we not do as Todd suggested? i.e. 'only report "corrupt" if we have a verifiable case of bad checksum' ? Is there some reason that's impossible to distinguish from other generic errors?

        Though this issue is relatively low priority, I don't think there's any reason we won't fix the issue, so I'd rather just leave it open.

        Show
        Aaron T. Myers added a comment - Can we not do as Todd suggested? i.e. 'only report "corrupt" if we have a verifiable case of bad checksum' ? Is there some reason that's impossible to distinguish from other generic errors? Though this issue is relatively low priority, I don't think there's any reason we won't fix the issue, so I'd rather just leave it open.
        Hide
        Harsh J added a comment -

        Ah but checksum errors are always reported properly (changing block data, adding extra bits to start/end, etc.). How would we equate a truncated/unreadable block to a checksum failure?

        Show
        Harsh J added a comment - Ah but checksum errors are always reported properly (changing block data, adding extra bits to start/end, etc.). How would we equate a truncated/unreadable block to a checksum failure?
        Hide
        Harsh J added a comment -

        Patch was invalid. Cancelling.

        Show
        Harsh J added a comment - Patch was invalid. Cancelling.
        Harsh J made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Harsh J added a comment -

        We'd need a way to determine a truncated/unreadable response from a DN, in order to cover the problem here. These both do not show up as checksum errors for some reason (at least the last time I checked).

        Show
        Harsh J added a comment - We'd need a way to determine a truncated/unreadable response from a DN, in order to cover the problem here. These both do not show up as checksum errors for some reason (at least the last time I checked).
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        131d 21h 23m 1 Harsh J 26/Dec/11 21:31
        Patch Available Patch Available Open Open
        23d 7h 20m 1 Harsh J 19/Jan/12 04:52

          People

          • Assignee:
            Harsh J
            Reporter:
            Aaron T. Myers
          • Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:

              Development