Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4660

Block corruption can happen during pipeline recovery

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0.3-alpha, 3.0.0-alpha1
    • Fix Version/s: 2.8.0, 2.7.1, 2.6.4, 3.0.0-alpha1
    • Component/s: datanode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      pipeline DN1 DN2 DN3
      stop DN2

      pipeline added node DN4 located at 2nd position
      DN1 DN4 DN3

      recover RBW
      DN4 after recover rbw
      2013-04-01 21:02:31,570 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover RBW replica BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1004
      2013-04-01 21:02:31,570 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW
      getNumBytes() = 134144
      getBytesOnDisk() = 134144
      getVisibleLength()= 134144
      end at chunk (134144/512=262)

      DN3 after recover rbw
      2013-04-01 21:02:31,575 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover RBW replica BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_10042013-04-01 21:02:31,575 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW
      getNumBytes() = 134028
      getBytesOnDisk() = 134028
      getVisibleLength()= 134028

      client send packet after recover pipeline
      offset=133632 len=1008

      DN4 after flush
      2013-04-01 21:02:31,779 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file offset:134640; meta offset:1063
      // meta end position should be floor(134640/512)*4 + 7 == 1059, but now it is 1063.

      DN3 after flush
      2013-04-01 21:02:31,782 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005, type=LAST_IN_PIPELINE, downstreams=0:[]: enqueue Packet(seqno=219, lastPacketInBlock=false, offsetInBlock=134640, ackEnqueueNanoTime=8817026136871545)
      2013-04-01 21:02:31,782 DEBUG org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Changing meta file offset of block BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005 from 1055 to 1051
      2013-04-01 21:02:31,782 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file offset:134640; meta offset:1059

      After checking meta on DN4, I found checksum of chunk 262 is duplicated, but data not.
      Later after block was finalized, DN4's scanner detected bad block, and then reported it to NM. NM send a command to delete this block, and replicate this block from other DN in pipeline to satisfy duplication num.

      I think this is because in BlockReceiver it skips data bytes already written, but not skips checksum bytes already written. And function adjustCrcFilePosition is only used for last non-completed chunk, but
      not for this situation.

      1. HDFS-4660.patch
        2 kB
        Peng Zhang
      2. HDFS-4660.patch
        8 kB
        Kihwal Lee
      3. HDFS-4660.v2.patch
        8 kB
        Kihwal Lee
      4. HDFS-4660.br26.patch
        8 kB
        Kihwal Lee
      5. periodic_hflush.patch
        3 kB
        Nathan Roberts

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10363 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10363/)
          HDFS-10652. Add a unit test for HDFS-4660. Contributed by Vinayakumar (yzhang: rev c25817159af17753b398956cfe6ff14984801b01)

          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10363 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10363/ ) HDFS-10652 . Add a unit test for HDFS-4660 . Contributed by Vinayakumar (yzhang: rev c25817159af17753b398956cfe6ff14984801b01) (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thank you very much Nathan Roberts!

          Show
          yzhangal Yongjun Zhang added a comment - Thank you very much Nathan Roberts !
          Hide
          nroberts Nathan Roberts added a comment -

          Hi Yongjun Zhang. Had to go back to an old git stash, but I'll attach a sample patch to TeraOutputFormat.

          Show
          nroberts Nathan Roberts added a comment - Hi Yongjun Zhang . Had to go back to an old git stash, but I'll attach a sample patch to TeraOutputFormat.
          Hide
          yzhangal Yongjun Zhang added a comment - - edited

          Hi Nathan Roberts,

          Thanks for your earlier work here. Would you please explain how you did the first step

          "Modify teragen to hflush() every 10000 records"

          in

          https://issues.apache.org/jira/browse/HDFS-4660?focusedCommentId=14542862&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14542862

          Thanks much.

          Show
          yzhangal Yongjun Zhang added a comment - - edited Hi Nathan Roberts , Thanks for your earlier work here. Would you please explain how you did the first step "Modify teragen to hflush() every 10000 records" in https://issues.apache.org/jira/browse/HDFS-4660?focusedCommentId=14542862&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14542862 Thanks much.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Hello Kihwal Lee we are seeing a similar bug on a CDH5.5 cluster, which has this fix (HDFS-4660), so it may be a different bug. Would you please take a look at HDFS-10587? We've analyzed the log and reconstructed the sequence of events, and we are in the process of creating a unit test.

          Thanks!

          Show
          jojochuang Wei-Chiu Chuang added a comment - Hello Kihwal Lee we are seeing a similar bug on a CDH5.5 cluster, which has this fix ( HDFS-4660 ), so it may be a different bug. Would you please take a look at HDFS-10587 ? We've analyzed the log and reconstructed the sequence of events, and we are in the process of creating a unit test. Thanks!
          Hide
          djp Junping Du added a comment -

          I have commit the 2.6 patch to branch-2.6. Thanks Kihwal Lee for updating the patch.

          Show
          djp Junping Du added a comment - I have commit the 2.6 patch to branch-2.6. Thanks Kihwal Lee for updating the patch.
          Hide
          kihwal Kihwal Lee added a comment -

          Attaching a 2.6 version of the patch.

          Show
          kihwal Kihwal Lee added a comment - Attaching a 2.6 version of the patch.
          Hide
          djp Junping Du added a comment -

          Hi Kihwal Lee, shall we backport this patch to 2.6.x branch?

          Show
          djp Junping Du added a comment - Hi Kihwal Lee , shall we backport this patch to 2.6.x branch?
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #2159 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2159/)
          HDFS-4660. Block corruption can happen during pipeline recovery. Contributed by Kihwal Lee. (kihwal: rev c74517c46bf00af408ed866b6577623cdec02de1)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #2159 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2159/ ) HDFS-4660 . Block corruption can happen during pipeline recovery. Contributed by Kihwal Lee. (kihwal: rev c74517c46bf00af408ed866b6577623cdec02de1) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2177 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2177/)
          HDFS-4660. Block corruption can happen during pipeline recovery. Contributed by Kihwal Lee. (kihwal: rev c74517c46bf00af408ed866b6577623cdec02de1)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2177 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2177/ ) HDFS-4660 . Block corruption can happen during pipeline recovery. Contributed by Kihwal Lee. (kihwal: rev c74517c46bf00af408ed866b6577623cdec02de1) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #229 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/229/)
          HDFS-4660. Block corruption can happen during pipeline recovery. Contributed by Kihwal Lee. (kihwal: rev c74517c46bf00af408ed866b6577623cdec02de1)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #229 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/229/ ) HDFS-4660 . Block corruption can happen during pipeline recovery. Contributed by Kihwal Lee. (kihwal: rev c74517c46bf00af408ed866b6577623cdec02de1) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #220 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/220/)
          HDFS-4660. Block corruption can happen during pipeline recovery. Contributed by Kihwal Lee. (kihwal: rev c74517c46bf00af408ed866b6577623cdec02de1)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #220 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/220/ ) HDFS-4660 . Block corruption can happen during pipeline recovery. Contributed by Kihwal Lee. (kihwal: rev c74517c46bf00af408ed866b6577623cdec02de1) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #231 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/231/)
          HDFS-4660. Block corruption can happen during pipeline recovery. Contributed by Kihwal Lee. (kihwal: rev c74517c46bf00af408ed866b6577623cdec02de1)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #231 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/231/ ) HDFS-4660 . Block corruption can happen during pipeline recovery. Contributed by Kihwal Lee. (kihwal: rev c74517c46bf00af408ed866b6577623cdec02de1) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #961 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/961/)
          HDFS-4660. Block corruption can happen during pipeline recovery. Contributed by Kihwal Lee. (kihwal: rev c74517c46bf00af408ed866b6577623cdec02de1)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #961 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/961/ ) HDFS-4660 . Block corruption can happen during pipeline recovery. Contributed by Kihwal Lee. (kihwal: rev c74517c46bf00af408ed866b6577623cdec02de1) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          Hide
          vinayrpet Vinayakumar B added a comment -

          After stress testing using the setup mentioned above, we have deployed the fix to the production cluster that generated checksum errors frequently. We have not seen any corruption so far. We are confident that it fixes the issue.

          Thanks for the info and contribution Kihwal Lee.

          Show
          vinayrpet Vinayakumar B added a comment - After stress testing using the setup mentioned above, we have deployed the fix to the production cluster that generated checksum errors frequently. We have not seen any corruption so far. We are confident that it fixes the issue. Thanks for the info and contribution Kihwal Lee .
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8028 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8028/)
          HDFS-4660. Block corruption can happen during pipeline recovery. Contributed by Kihwal Lee. (kihwal: rev c74517c46bf00af408ed866b6577623cdec02de1)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8028 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8028/ ) HDFS-4660 . Block corruption can happen during pipeline recovery. Contributed by Kihwal Lee. (kihwal: rev c74517c46bf00af408ed866b6577623cdec02de1) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          kihwal Kihwal Lee added a comment -

          Thanks for reports and reviews. I've committed this to trunk, branch-2 and branch-2.7.

          Show
          kihwal Kihwal Lee added a comment - Thanks for reports and reviews. I've committed this to trunk, branch-2 and branch-2.7.
          Hide
          kihwal Kihwal Lee added a comment -

          Thanks, Nathan. With Vinay's binding +1 and Nathan's review, I wil commit this.

          Show
          kihwal Kihwal Lee added a comment - Thanks, Nathan. With Vinay's binding +1 and Nathan's review, I wil commit this.
          Hide
          nroberts Nathan Roberts added a comment -

          +1 on the patch. I have reviewed the patch previously and it is currently running in production at scale.

          The stress test we ran against this in https://issues.apache.org/jira/browse/HDFS-4660?focusedCommentId=14542862&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14542862 heavily exercised this path.

          Show
          nroberts Nathan Roberts added a comment - +1 on the patch. I have reviewed the patch previously and it is currently running in production at scale. The stress test we ran against this in https://issues.apache.org/jira/browse/HDFS-4660?focusedCommentId=14542862&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14542862 heavily exercised this path.
          Hide
          kihwal Kihwal Lee added a comment -

          Vinod Kumar Vavilapalli If the commit does not happen today, I will move it to 2.7.2.

          Show
          kihwal Kihwal Lee added a comment - Vinod Kumar Vavilapalli If the commit does not happen today, I will move it to 2.7.2.
          Hide
          kihwal Kihwal Lee added a comment -

          Thanks for the review, Vinayakumar B. After stress testing using the setup mentioned above, we have deployed the fix to the production cluster that generated checksum errors frequently. We have not seen any corruption so far. We are confident that it fixes the issue.

          Show
          kihwal Kihwal Lee added a comment - Thanks for the review, Vinayakumar B . After stress testing using the setup mentioned above, we have deployed the fix to the production cluster that generated checksum errors frequently. We have not seen any corruption so far. We are confident that it fixes the issue.
          Hide
          vinayrpet Vinayakumar B added a comment -

          In short description: Packet's offsetInBlock points back more than bytesPerChecksum(512) than onDisk length, checksum will have duplicates, but not data.

          Patch looks very nice with the detailed description.
          +1 LGTM.

          // Determine how many checksums need to be skipped up to the last
          // boundary. The checksum after the boundary was already counted
          // above. Only count the number of checksums skipped up to the
          // boundary here.

          Actual fix required was this. i.e. Number of checksum bytes to skip.

          But patch enhanced the readability along with fix.

          For the calculation purpose, I used the numbers given in description. It seems to me that, problem will be solved from this.

          I would love to see, if someone else also confirms the calculation, before going ahead for commit.

          Show
          vinayrpet Vinayakumar B added a comment - In short description: Packet's offsetInBlock points back more than bytesPerChecksum(512) than onDisk length, checksum will have duplicates, but not data. Patch looks very nice with the detailed description. +1 LGTM. // Determine how many checksums need to be skipped up to the last // boundary. The checksum after the boundary was already counted // above. Only count the number of checksums skipped up to the // boundary here. Actual fix required was this. i.e. Number of checksum bytes to skip. But patch enhanced the readability along with fix. For the calculation purpose, I used the numbers given in description. It seems to me that, problem will be solved from this. I would love to see, if someone else also confirms the calculation, before going ahead for commit.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Kihwal Lee, I see you marked this as a blocker for 2.7.1.

          Assuming you can get hold of someone's review bandwidth to get this done soonish, we are good. Otherwise, also given this is a long standing issue, I recommend we track this instead for 2.7.2. What do you think?

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Kihwal Lee , I see you marked this as a blocker for 2.7.1. Assuming you can get hold of someone's review bandwidth to get this done soonish, we are good. Otherwise, also given this is a long standing issue, I recommend we track this instead for 2.7.2. What do you think?
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 38s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 javac 7m 37s There were no new javac warning messages.
          +1 javadoc 9m 37s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 36s There were no new checkstyle issues.
          -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 39s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 3m 4s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 native 3m 16s Pre-build of native portion
          -1 hdfs tests 163m 24s Tests failed in hadoop-hdfs.
              204m 50s  



          Reason Tests
          Failed unit tests hadoop.hdfs.TestAppendSnapshotTruncate



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12733630/HDFS-4660.v2.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 0790275
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11038/artifact/patchprocess/whitespace.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11038/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11038/testReport/
          Java 1.7.0_55
          uname Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11038/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 38s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 37s There were no new javac warning messages. +1 javadoc 9m 37s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 36s There were no new checkstyle issues. -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 39s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 3m 4s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 native 3m 16s Pre-build of native portion -1 hdfs tests 163m 24s Tests failed in hadoop-hdfs.     204m 50s   Reason Tests Failed unit tests hadoop.hdfs.TestAppendSnapshotTruncate Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12733630/HDFS-4660.v2.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 0790275 whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/11038/artifact/patchprocess/whitespace.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/11038/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11038/testReport/ Java 1.7.0_55 uname Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11038/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 35s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 javac 7m 29s There were no new javac warning messages.
          +1 javadoc 9m 39s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 2m 14s The applied patch generated 4 new checkstyle issues (total was 62, now 63).
          -1 whitespace 0m 0s The patch has 6 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 33s mvn install still works.
          +1 eclipse:eclipse 0m 35s The patch built with eclipse:eclipse.
          +1 findbugs 3m 2s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          +1 native 3m 13s Pre-build of native portion
          -1 hdfs tests 168m 19s Tests failed in hadoop-hdfs.
              211m 5s  



          Reason Tests
          Failed unit tests hadoop.tools.TestHdfsConfigFields



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12732698/HDFS-4660.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 281d47a
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/10961/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/10961/artifact/patchprocess/whitespace.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/10961/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/10961/testReport/
          Java 1.7.0_55
          uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10961/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 35s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 29s There were no new javac warning messages. +1 javadoc 9m 39s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 2m 14s The applied patch generated 4 new checkstyle issues (total was 62, now 63). -1 whitespace 0m 0s The patch has 6 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 33s mvn install still works. +1 eclipse:eclipse 0m 35s The patch built with eclipse:eclipse. +1 findbugs 3m 2s The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 native 3m 13s Pre-build of native portion -1 hdfs tests 168m 19s Tests failed in hadoop-hdfs.     211m 5s   Reason Tests Failed unit tests hadoop.tools.TestHdfsConfigFields Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12732698/HDFS-4660.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 281d47a checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/10961/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/10961/artifact/patchprocess/whitespace.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/10961/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/10961/testReport/ Java 1.7.0_55 uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10961/console This message was automatically generated.
          Hide
          kihwal Kihwal Lee added a comment -

          We saw this kind of corruption happening when the copied partial block data does not end at a packet boundary. The un-acked packets are resent from the client and if the end of the on-disk data is not aligned, corruption happens. This is very difficult to reproduce in unit test without being too invasive.

          However, data corruption can be reproduced in a 10-node cluster. Here is how we reproduced it and verified the patch (Credit goes to Nathan Roberts) :

          • Modify teragen to hflush() every 10000 records
          • Change datanode WRITE_TIMEOUT_EXTENSION from 5000 ms to 1ms - allows socket write timeout config to have full control over the write timeout
          • Config dfs.datanode.socket.write.timeout to 2000ms
          • Config dfs.client.block.write.replace-datanode-on-failure.policy to ALWAYS so that write pipelines are always immediately reconstructed when a failure occurs
          • Run teragen with 100 maps, each outputting 10000000000
          • Success criteria is no "Checksum verification failed" in any datanode logs. This is from added checksum verification in recoverRbw(). A patch will be provided in HDFS-8395.
          • The write timeout is so aggressive that the teragen job will probably fail due to multiple, repeated failures eventually causing task attempts to fail, this is expected.
          Show
          kihwal Kihwal Lee added a comment - We saw this kind of corruption happening when the copied partial block data does not end at a packet boundary. The un-acked packets are resent from the client and if the end of the on-disk data is not aligned, corruption happens. This is very difficult to reproduce in unit test without being too invasive. However, data corruption can be reproduced in a 10-node cluster. Here is how we reproduced it and verified the patch (Credit goes to Nathan Roberts ) : Modify teragen to hflush() every 10000 records Change datanode WRITE_TIMEOUT_EXTENSION from 5000 ms to 1ms - allows socket write timeout config to have full control over the write timeout Config dfs.datanode.socket.write.timeout to 2000ms Config dfs.client.block.write.replace-datanode-on-failure.policy to ALWAYS so that write pipelines are always immediately reconstructed when a failure occurs Run teragen with 100 maps, each outputting 10000000000 Success criteria is no "Checksum verification failed" in any datanode logs. This is from added checksum verification in recoverRbw(). A patch will be provided in HDFS-8395 . The write timeout is so aggressive that the teragen job will probably fail due to multiple, repeated failures eventually causing task attempts to fail, this is expected.
          Hide
          kihwal Kihwal Lee added a comment -

          Canceling the existing patch and assigning it to me.

          Show
          kihwal Kihwal Lee added a comment - Canceling the existing patch and assigning it to me.
          Hide
          kihwal Kihwal Lee added a comment -

          Actually this is a serious data corruption issue. It is easily reproduced when timeout is set shorter and data is written and flushed frequently. If a sufficient load is put on, timeout can occur and a pipeline recovery is triggered. If a new node is added, the partial block copy can make the ACKed size on the new node bigger than others. Although less likely, the same thing can happen without involving a new node. It can also happen in partial chunk cases, which the existing patch does not handle.

          I have a patch that was stress tested and internally reviewed. I am in the process of adding a unit test.

          Show
          kihwal Kihwal Lee added a comment - Actually this is a serious data corruption issue. It is easily reproduced when timeout is set shorter and data is written and flushed frequently. If a sufficient load is put on, timeout can occur and a pipeline recovery is triggered. If a new node is added, the partial block copy can make the ACKed size on the new node bigger than others. Although less likely, the same thing can happen without involving a new node. It can also happen in partial chunk cases, which the existing patch does not handle. I have a patch that was stress tested and internally reviewed. I am in the process of adding a unit test.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 patch 0m 0s The patch command could not apply the patch during dryrun.



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12576518/HDFS-4660.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / f1a152c
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10562/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 0s The patch command could not apply the patch during dryrun. Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12576518/HDFS-4660.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / f1a152c Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10562/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 patch 0m 0s The patch command could not apply the patch during dryrun.



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12576518/HDFS-4660.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / f1a152c
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10537/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 0s The patch command could not apply the patch during dryrun. Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12576518/HDFS-4660.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / f1a152c Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10537/console This message was automatically generated.
          Hide
          peng.zhang Peng Zhang added a comment -

          If recoverRbw() truncates the block to acked length, then in this issue's scenario, after pipeline recovery: len(DN1) = len(DN4) <= len(DN3).
          And then client will continue from acked offset (len(DN1)), so DN3 may falls into trap that DN4 had before.

          IMHO, boundary check is still needed to solve duplicated checksum problem.

          Show
          peng.zhang Peng Zhang added a comment - If recoverRbw() truncates the block to acked length, then in this issue's scenario, after pipeline recovery: len(DN1) = len(DN4) <= len(DN3). And then client will continue from acked offset (len(DN1)), so DN3 may falls into trap that DN4 had before. IMHO, boundary check is still needed to solve duplicated checksum problem.
          Hide
          kihwal Kihwal Lee added a comment -

          After recovery, DN4 may had more bytes than DN3.

          In HDFS-3875, such behaviors cause checksum errors in unacked portion of data to be uncaught and later detected when NN tries to asynchronously replicate the block. At that point, no one has a valid copy and the use experiences a data loss. The latest proposed solution in HDFS-3875 is to truncate the block on recoverRbw(). The reasoning is that the unacked portion is not checksum verified and thus cannot be trusted. Will this also address this issue?

          Show
          kihwal Kihwal Lee added a comment - After recovery, DN4 may had more bytes than DN3. In HDFS-3875 , such behaviors cause checksum errors in unacked portion of data to be uncaught and later detected when NN tries to asynchronously replicate the block. At that point, no one has a valid copy and the use experiences a data loss. The latest proposed solution in HDFS-3875 is to truncate the block on recoverRbw(). The reasoning is that the unacked portion is not checksum verified and thus cannot be trusted. Will this also address this issue?
          Hide
          peng.zhang Peng Zhang added a comment -

          #Call hflush to ensure that all DNs have the full length

          I think if this process happened, bug will not be triggered.

          After client called hflush() without all DNs acked, DN1 may had more bytes than other DNs.
          So if DN2 died and new added DN4 located at 2nd position of the pipeline(controlled by NM's pipeline sort algorithm), it will recover RBW from DN1.
          After recovery, DN4 may had more bytes than DN3.
          And client will continue sending from smallest offset that it received acks.
          So this will cause DN4 to "receive a packet, part of which needs to write and part needs to skip. When amount of data to skip reaches trunk size, receiver doesn't skip checksum and has it duplicated".

          Creating a test case from high-level may be not easy, because we need to control DNs file position after hflush, and also DN4's location in recovered pipeline.

          Show
          peng.zhang Peng Zhang added a comment - #Call hflush to ensure that all DNs have the full length I think if this process happened, bug will not be triggered. After client called hflush() without all DNs acked, DN1 may had more bytes than other DNs. So if DN2 died and new added DN4 located at 2nd position of the pipeline(controlled by NM's pipeline sort algorithm), it will recover RBW from DN1. After recovery, DN4 may had more bytes than DN3. And client will continue sending from smallest offset that it received acks. So this will cause DN4 to "receive a packet, part of which needs to write and part needs to skip. When amount of data to skip reaches trunk size, receiver doesn't skip checksum and has it duplicated". Creating a test case from high-level may be not easy, because we need to control DNs file position after hflush, and also DN4's location in recovered pipeline.
          Hide
          tlipcon Todd Lipcon added a comment -

          Can you create a functional test which does something like this?

          • Create a pipeline and write a number of bytes which isn't an exact multiple of the checksum chunk size (eg 800 bytes).
          • Call hflush to ensure that all DNs have the full length
          • Restart the second DN in the pipeline, to trigger adding DN4
          • Write a bit more and close the file.
          • Verify that all replicas have identical checksum files.
          Show
          tlipcon Todd Lipcon added a comment - Can you create a functional test which does something like this? Create a pipeline and write a number of bytes which isn't an exact multiple of the checksum chunk size (eg 800 bytes). Call hflush to ensure that all DNs have the full length Restart the second DN in the pipeline, to trigger adding DN4 Write a bit more and close the file. Verify that all replicas have identical checksum files.
          Hide
          peng.zhang Peng Zhang added a comment -

          The scenario for this issue is simple: client send a packet, part of which needs to write and part needs to skip. When amount of data to skip reaches trunk size, receiver doesn't skip checksum and has it duplicated.

          But for unit test, I found receivePacket() got many dependences and there's no test for it before. So I think it's not easy to add unit tests for it.

          Any good ideas, Todd?

          Show
          peng.zhang Peng Zhang added a comment - The scenario for this issue is simple: client send a packet, part of which needs to write and part needs to skip. When amount of data to skip reaches trunk size, receiver doesn't skip checksum and has it duplicated. But for unit test, I found receivePacket() got many dependences and there's no test for it before. So I think it's not easy to add unit tests for it. Any good ideas, Todd?
          Hide
          tlipcon Todd Lipcon added a comment -

          Hi Peng. Can you please see if you can add a unit test for this?

          Show
          tlipcon Todd Lipcon added a comment - Hi Peng. Can you please see if you can add a unit test for this?
          Hide
          peng.zhang Peng Zhang added a comment -

          Suresh Srinivas plz help to review this issue

          Show
          peng.zhang Peng Zhang added a comment - Suresh Srinivas plz help to review this issue
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12576518/HDFS-4660.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.fs.TestFcHdfsSymlink

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4179//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4179//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576518/HDFS-4660.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.fs.TestFcHdfsSymlink +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4179//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4179//console This message is automatically generated.
          Hide
          peng.zhang Peng Zhang added a comment -

          patch for trunk

          Show
          peng.zhang Peng Zhang added a comment - patch for trunk

            People

            • Assignee:
              kihwal Kihwal Lee
              Reporter:
              peng.zhang Peng Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              28 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development