Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10627

Volume Scanner marks a block as "suspect" even if the exception is network-related

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.0
    • Fix Version/s: 2.7.4, 3.0.0-alpha2
    • Component/s: hdfs
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      In the BlockSender code,

      BlockSender.java
              if (!ioem.startsWith("Broken pipe") && !ioem.startsWith("Connection reset")) {
                LOG.error("BlockSender.sendChunks() exception: ", e);
              }
              datanode.getBlockScanner().markSuspectBlock(
                    volumeRef.getVolume().getStorageID(),
                    block);
      

      Before HDFS-7686, the block was marked as suspect only if the exception message doesn't start with Broken pipe or Connection reset.
      But after HDFS-7686, the block is marked as corrupt irrespective of the exception message.

      In one of our datanode, it took approximately a whole day (22 hours) to go through all the suspect blocks to scan one corrupt block.

      1. HDFS-10627.patch
        1.0 kB
        Rushabh S Shah

        Issue Links

          Activity

          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Rushabh S Shah this piece of code can be useful during block transfer due to prior pipeline recovery.
          In which case, if the receiver detects corruption in the replica, it immediately resets connection.

          If block transfer is not initiated due to pipeline recovery, the receiver also notifies NameNode that the source's replica is corrupt (this is actually not accurate because the corruption may due to other issues, and which causes the bug described in HDFS-6804).

          In short, I think this code is still necessary, because of the lack of feedback mechanism in block transfer during pipeline recovery.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Rushabh S Shah this piece of code can be useful during block transfer due to prior pipeline recovery. In which case, if the receiver detects corruption in the replica, it immediately resets connection. If block transfer is not initiated due to pipeline recovery, the receiver also notifies NameNode that the source's replica is corrupt (this is actually not accurate because the corruption may due to other issues, and which causes the bug described in HDFS-6804 ). In short, I think this code is still necessary, because of the lack of feedback mechanism in block transfer during pipeline recovery.
          Hide
          daryn Daryn Sharp added a comment -

          I agree it's unfortunate there's no feedback mechanism. I disagree the serving node should ever assume that loosing a client means the block might be corrupt. There are many reasons the client can "unexpectedly" close the connection. Processes shutdown unexpectedly, get killed, etc.

          The only time the DN should be suspect of its own block is a local IOE. I checked some of our big clusters: a DN self-reporting a corrupt block during pipeline/block recovery is a fraction of a percent (which is being generous).

          So let's consider... Is it worth backlogging a DN with so many false positives from broken connections that:

          1. It takes a day to scan a legitimately bad block detected by a local IOE
          2. Leading to a rack failure causing temporary data loss
          3. Scanner isn't doing it's primary job of trawling the storage for bad blocks
          Show
          daryn Daryn Sharp added a comment - I agree it's unfortunate there's no feedback mechanism. I disagree the serving node should ever assume that loosing a client means the block might be corrupt. There are many reasons the client can "unexpectedly" close the connection. Processes shutdown unexpectedly, get killed, etc. The only time the DN should be suspect of its own block is a local IOE. I checked some of our big clusters: a DN self-reporting a corrupt block during pipeline/block recovery is a fraction of a percent (which is being generous). So let's consider... Is it worth backlogging a DN with so many false positives from broken connections that: It takes a day to scan a legitimately bad block detected by a local IOE Leading to a rack failure causing temporary data loss Scanner isn't doing it's primary job of trawling the storage for bad blocks
          Hide
          daryn Daryn Sharp added a comment -

          Rushabh and I checked a few random healthy nodes on multiple clusters.

          1. They are backlogged with thousands of suspect blocks on all storages. It will take days to catch up – assuming no more false positives.
          2. They been up for months but haven't started a rescan in over a month (available non-archived logs). So obviously the false positives are trickling in faster than the the scan rate.
          3. 1 node reported one bad block in the past month. The others have not reported any bad blocks.
          4. On a large cluster, 0.08% of pipeline recovery corruption was detected

          The scanner is completely negligent in its duty to find and report bad blocks. Rushabh added the priority scan feature to prevent rack failure from causing (hopefully) temporary data loss. Instead of waiting for up to week for a suspected corrupt block to be reported, it would be reported almost immediately. Well, guess what happened today? Rack failed. Lost data. DN knew the block was bad but was too backlogged to verify and report it. Completely avoidable situation not worth detecting 0.08% of pipeline recovery corruptions.

          This is completely broken and must be reverted to be fixed.

          Show
          daryn Daryn Sharp added a comment - Rushabh and I checked a few random healthy nodes on multiple clusters. They are backlogged with thousands of suspect blocks on all storages. It will take days to catch up – assuming no more false positives. They been up for months but haven't started a rescan in over a month (available non-archived logs). So obviously the false positives are trickling in faster than the the scan rate. 1 node reported one bad block in the past month. The others have not reported any bad blocks. On a large cluster, 0.08% of pipeline recovery corruption was detected The scanner is completely negligent in its duty to find and report bad blocks . Rushabh added the priority scan feature to prevent rack failure from causing (hopefully) temporary data loss. Instead of waiting for up to week for a suspected corrupt block to be reported, it would be reported almost immediately. Well, guess what happened today? Rack failed. Lost data. DN knew the block was bad but was too backlogged to verify and report it. Completely avoidable situation not worth detecting 0.08% of pipeline recovery corruptions. This is completely broken and must be reverted to be fixed .
          Hide
          shahrs87 Rushabh S Shah added a comment -

          Brought back the old behavior so that block sender will mark a block as suspect only if the exception doesn't start with Broken pipe and Connection reset

          Show
          shahrs87 Rushabh S Shah added a comment - Brought back the old behavior so that block sender will mark a block as suspect only if the exception doesn't start with Broken pipe and Connection reset
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Hi Daryn, I totally get that.

          Out of curiosity, why isn't a packet responder instantiated for block transfer operations? Looking at the code, a packet responder is only instantiated for writing a pipeline.

          I was relatively concerned about removing it, because Yongjun Zhang and I have been diagnosing a block corruption bug very similar to HDFS-4660 and HDFS-9220, and a volume scanner that is called up to scan a suspect block in these cases is useful.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Hi Daryn, I totally get that. Out of curiosity, why isn't a packet responder instantiated for block transfer operations? Looking at the code, a packet responder is only instantiated for writing a pipeline. I was relatively concerned about removing it, because Yongjun Zhang and I have been diagnosing a block corruption bug very similar to HDFS-4660 and HDFS-9220 , and a volume scanner that is called up to scan a suspect block in these cases is useful.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks guys for the work here!

          When detecting corruption of block (checksum error), in pipeline write, or block transfer of pipeline recovery, I hope a checksum exception can be thrown and delivered back to sender, instead of just disconnect. Is this totally not feasible here?

          Show
          yzhangal Yongjun Zhang added a comment - Thanks guys for the work here! When detecting corruption of block (checksum error), in pipeline write, or block transfer of pipeline recovery, I hope a checksum exception can be thrown and delivered back to sender, instead of just disconnect. Is this totally not feasible here?
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 21s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 6m 54s trunk passed
          +1 compile 0m 44s trunk passed
          +1 checkstyle 0m 26s trunk passed
          +1 mvnsite 0m 52s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 41s trunk passed
          +1 javadoc 0m 55s trunk passed
          +1 mvninstall 0m 47s the patch passed
          +1 compile 0m 42s the patch passed
          +1 javac 0m 42s the patch passed
          +1 checkstyle 0m 23s the patch passed
          +1 mvnsite 0m 49s the patch passed
          +1 mvneclipse 0m 9s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 48s the patch passed
          +1 javadoc 0m 53s the patch passed
          -1 unit 85m 12s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 19s The patch does not generate ASF License warnings.
          104m 21s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
            hadoop.hdfs.server.namenode.TestUpgradeDomainBlockPlacementPolicy
          Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2
            org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshot



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818222/HDFS-10627.patch
          JIRA Issue HDFS-10627
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux dda4ed9a372e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / c48e9d6
          Default Java 1.8.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16070/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16070/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16070/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 21s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 6m 54s trunk passed +1 compile 0m 44s trunk passed +1 checkstyle 0m 26s trunk passed +1 mvnsite 0m 52s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 41s trunk passed +1 javadoc 0m 55s trunk passed +1 mvninstall 0m 47s the patch passed +1 compile 0m 42s the patch passed +1 javac 0m 42s the patch passed +1 checkstyle 0m 23s the patch passed +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 9s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 48s the patch passed +1 javadoc 0m 53s the patch passed -1 unit 85m 12s hadoop-hdfs in the patch failed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 104m 21s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.server.namenode.TestUpgradeDomainBlockPlacementPolicy Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2   org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshot Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818222/HDFS-10627.patch JIRA Issue HDFS-10627 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux dda4ed9a372e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / c48e9d6 Default Java 1.8.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16070/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16070/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16070/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          daryn Daryn Sharp added a comment -

          Adding a feedback mechanism would be very useful, but should be a different jira. I'm sure it's harder than it seems. (I'm not sure why the packet responder isn't started. I think maybe as an optimization sometimes and/or suspect it may have to do with recovery needing to copy what appears to have tail corruption before truncating it. Not my area of expertise...)

          This jira however must restore prior behavior so our clusters can actually detect bad blocks. Latent corruption is going undetected. Legit/detected corruption is queued for days which increases risk of data loss. DNs are too busy verifying false positives from clients that didn't fully read the stream.

          Show
          daryn Daryn Sharp added a comment - Adding a feedback mechanism would be very useful, but should be a different jira. I'm sure it's harder than it seems. (I'm not sure why the packet responder isn't started. I think maybe as an optimization sometimes and/or suspect it may have to do with recovery needing to copy what appears to have tail corruption before truncating it. Not my area of expertise...) This jira however must restore prior behavior so our clusters can actually detect bad blocks. Latent corruption is going undetected. Legit/detected corruption is queued for days which increases risk of data loss. DNs are too busy verifying false positives from clients that didn't fully read the stream.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          2.7.3 is under release process, changing target-version to 2.7.4.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - 2.7.3 is under release process, changing target-version to 2.7.4.
          Hide
          daryn Daryn Sharp added a comment -

          +1 Will check in tomorrow afternoon unless there are further objections to discuss.

          Show
          daryn Daryn Sharp added a comment - +1 Will check in tomorrow afternoon unless there are further objections to discuss.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 21s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 7m 29s trunk passed
          +1 compile 0m 44s trunk passed
          +1 checkstyle 0m 27s trunk passed
          +1 mvnsite 0m 51s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 43s trunk passed
          +1 javadoc 0m 56s trunk passed
          +1 mvninstall 0m 46s the patch passed
          +1 compile 0m 41s the patch passed
          +1 javac 0m 41s the patch passed
          +1 checkstyle 0m 24s the patch passed
          +1 mvnsite 0m 49s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 49s the patch passed
          +1 javadoc 0m 53s the patch passed
          +1 unit 61m 15s hadoop-hdfs in the patch passed.
          +1 asflicense 0m 31s The patch does not generate ASF License warnings.
          81m 16s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10627
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818222/HDFS-10627.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux ffeffbf8a91c 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 736d33c
          Default Java 1.8.0_101
          findbugs v3.0.0
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16987/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16987/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 21s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 7m 29s trunk passed +1 compile 0m 44s trunk passed +1 checkstyle 0m 27s trunk passed +1 mvnsite 0m 51s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 43s trunk passed +1 javadoc 0m 56s trunk passed +1 mvninstall 0m 46s the patch passed +1 compile 0m 41s the patch passed +1 javac 0m 41s the patch passed +1 checkstyle 0m 24s the patch passed +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 49s the patch passed +1 javadoc 0m 53s the patch passed +1 unit 61m 15s hadoop-hdfs in the patch passed. +1 asflicense 0m 31s The patch does not generate ASF License warnings. 81m 16s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10627 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818222/HDFS-10627.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux ffeffbf8a91c 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 736d33c Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16987/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16987/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          kihwal Kihwal Lee added a comment -

          +1 Will check in tomorrow afternoon unless there are further objections to discuss.

          Daryn Sharp probably meant it in Mercury days (1408 hours). I say we waited long enough.

          Show
          kihwal Kihwal Lee added a comment - +1 Will check in tomorrow afternoon unless there are further objections to discuss. Daryn Sharp probably meant it in Mercury days (1408 hours). I say we waited long enough.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          I think this is okay. In a data transfer scenario (non-pipeline writing), If the destination detects corruption in the replica it receives, it reports corruption to the namenode. That is a more accurate response than marking a block as suspect when socket is broken.

          Show
          jojochuang Wei-Chiu Chuang added a comment - I think this is okay. In a data transfer scenario (non-pipeline writing), If the destination detects corruption in the replica it receives, it reports corruption to the namenode. That is a more accurate response than marking a block as suspect when socket is broken.
          Hide
          kihwal Kihwal Lee added a comment -

          Committed this to trunk, branch-2, branch-2.8 and branch-2.7.

          Show
          kihwal Kihwal Lee added a comment - Committed this to trunk, branch-2, branch-2.8 and branch-2.7.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10644 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10644/)
          HDFS-10627. Volume Scanner marks a block as "suspect" even if the (kihwal: rev 5c0bffddc0cb824a8a2751bcd0dc3e15ce081727)

          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10644 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10644/ ) HDFS-10627 . Volume Scanner marks a block as "suspect" even if the (kihwal: rev 5c0bffddc0cb824a8a2751bcd0dc3e15ce081727) (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java

            People

            • Assignee:
              shahrs87 Rushabh S Shah
              Reporter:
              shahrs87 Rushabh S Shah
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development