Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11472

Fix inconsistent replica size after a data pipeline failure

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 2.7.4, 3.0.0-beta1, 2.8.2
    • Component/s: datanode
    • Labels:
      None

      Description

      We observed a case where a replica's on disk length is less than acknowledged length, breaking the assumption in recovery code.

      2017-01-08 01:41:03,532 WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to obtain replica info for block (=BP-947993742-10.204.0.136-1362248978912:blk_2526438952_1101394519586) from datanode (=DatanodeInfoWithStorage[10.204.138.17:1004,null,null])
      java.io.IOException: THIS IS NOT SUPPOSED TO HAPPEN: getBytesOnDisk() < getVisibleLength(), rip=ReplicaBeingWritten, blk_2526438952_1101394519586, RBW
        getNumBytes()     = 27530
        getBytesOnDisk()  = 27006
        getVisibleLength()= 27268
        getVolume()       = /data/6/hdfs/datanode/current
        getBlockFile()    = /data/6/hdfs/datanode/current/BP-947993742-10.204.0.136-1362248978912/current/rbw/blk_2526438952
        bytesAcked=27268
        bytesOnDisk=27006
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.initReplicaRecovery(FsDatasetImpl.java:2284)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.initReplicaRecovery(FsDatasetImpl.java:2260)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:2566)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.callInitReplicaRecovery(DataNode.java:2577)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2645)
              at org.apache.hadoop.hdfs.server.datanode.DataNode.access$400(DataNode.java:245)
              at org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2551)
              at java.lang.Thread.run(Thread.java:745)
      

      It turns out that if an exception is thrown within BlockReceiver#receivePacket, the in-memory replica on disk length may not be updated, but the data is written to disk anyway.

      For example, here's one exception we observed

      2017-01-08 01:40:59,512 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-947993742-10.204.0.136-1362248978912:blk_2526438952_1101394499067
      java.nio.channels.ClosedByInterruptException
              at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
              at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:269)
              at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.adjustCrcChannelPosition(FsDatasetImpl.java:1484)
              at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.adjustCrcFilePosition(BlockReceiver.java:994)
              at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:670)
              at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:857)
              at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:797)
              at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
              at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
              at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
              at java.lang.Thread.run(Thread.java:745)
      

      There are potentially other places and causes where an exception is thrown within BlockReceiver#receivePacket, so it may not make much sense to alleviate it for this particular exception. Instead, we should improve replica recovery code to handle the case where ondisk size is less than acknowledged size, and update in-memory checksum accordingly.

      1. HDFS-11472.001.patch
        10 kB
        Wei-Chiu Chuang
      2. HDFS-11472.002.patch
        10 kB
        Wei-Chiu Chuang
      3. HDFS-11472.003.patch
        10 kB
        Wei-Chiu Chuang
      4. HDFS-11472.004.patch
        8 kB
        Erik Krogen
      5. HDFS-11472.005.patch
        8 kB
        Erik Krogen
      6. HDFS-11472.testcase.patch
        2 kB
        Wei-Chiu Chuang
      7. HDFS-11472-branch-2.005.patch
        8 kB
        Erik Krogen
      8. HDFS-11472-branch-2.7.005.patch
        7 kB
        Erik Krogen
      9. HDFS-11472-branch-2.8.005.patch
        8 kB
        Erik Krogen

        Activity

        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        IMHO, this boils down to that replica recovery does not consider the case where ondisk length can be less than acknowledged length.

        Attach a sample test to reproduce the replica recovery bug.

        Show
        jojochuang Wei-Chiu Chuang added a comment - IMHO, this boils down to that replica recovery does not consider the case where ondisk length can be less than acknowledged length. Attach a sample test to reproduce the replica recovery bug.
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        Inviting folks to comment. It may not be a good idea to be too clever about the error handling, but we had a data loss incidence due to this issue, so I want to see if we can alleviate this problem using this approach.

        Show
        jojochuang Wei-Chiu Chuang added a comment - Inviting folks to comment. It may not be a good idea to be too clever about the error handling, but we had a data loss incidence due to this issue, so I want to see if we can alleviate this problem using this approach.
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        Attach my first patch.

        This patch adds extra handling when ondisk bytes is less than acknowledged length in FsDatasetImpl#recoverRbwImpl. In such case, it looks at the block file, and if the data written into the block file is more than acknowledged length, update the in-memory bytesOnDisk and truncate the block file to match acknowledged length.

        The test case verifies two scenarios (1) block file length >= acknowledged length (2) block file length < acknowledged length. In the latter case the recovery attempt will fail.

        Looking forward to comments. Thanks!

        Show
        jojochuang Wei-Chiu Chuang added a comment - Attach my first patch. This patch adds extra handling when ondisk bytes is less than acknowledged length in FsDatasetImpl#recoverRbwImpl . In such case, it looks at the block file, and if the data written into the block file is more than acknowledged length, update the in-memory bytesOnDisk and truncate the block file to match acknowledged length. The test case verifies two scenarios (1) block file length >= acknowledged length (2) block file length < acknowledged length. In the latter case the recovery attempt will fail. Looking forward to comments. Thanks!
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        Forgot to submit my patch...

        Show
        jojochuang Wei-Chiu Chuang added a comment - Forgot to submit my patch...
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 19s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        +1 mvninstall 14m 28s trunk passed
        +1 compile 0m 48s trunk passed
        +1 checkstyle 0m 38s trunk passed
        +1 mvnsite 0m 56s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        +1 findbugs 2m 2s trunk passed
        +1 javadoc 0m 45s trunk passed
        +1 mvninstall 0m 57s the patch passed
        +1 compile 0m 48s the patch passed
        +1 javac 0m 48s the patch passed
        -0 checkstyle 0m 34s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 121 unchanged - 1 fixed = 123 total (was 122)
        +1 mvnsite 0m 48s the patch passed
        +1 mvneclipse 0m 10s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 49s the patch passed
        +1 javadoc 0m 38s the patch passed
        +1 unit 67m 57s hadoop-hdfs in the patch passed.
        +1 asflicense 0m 18s The patch does not generate ASF License warnings.
        95m 30s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue HDFS-11472
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12860940/HDFS-11472.001.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux aff13cc0a83d 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / c8bd5fc
        Default Java 1.8.0_121
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18910/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18910/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18910/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 19s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 14m 28s trunk passed +1 compile 0m 48s trunk passed +1 checkstyle 0m 38s trunk passed +1 mvnsite 0m 56s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 2m 2s trunk passed +1 javadoc 0m 45s trunk passed +1 mvninstall 0m 57s the patch passed +1 compile 0m 48s the patch passed +1 javac 0m 48s the patch passed -0 checkstyle 0m 34s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 121 unchanged - 1 fixed = 123 total (was 122) +1 mvnsite 0m 48s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 49s the patch passed +1 javadoc 0m 38s the patch passed +1 unit 67m 57s hadoop-hdfs in the patch passed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 95m 30s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11472 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12860940/HDFS-11472.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux aff13cc0a83d 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / c8bd5fc Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18910/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18910/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18910/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        xkrogen Erik Krogen added a comment -

        Hey Wei-Chiu Chuang, trying to make sure I understand this patch. IIUC we basically want to address the potential discrepancy between what is actually on disk and the in-memory idea of what is on disk (getBytesOnDisk()). The changes to FsDatasetImpl#recoverRbwImpl() seem reasonable and the test seems good.

        I'm less sure of the change to FsDatasetImpl#initReplicaRecoveryImpl(). If the actual number of bytes on disk is less than getVisibleLength(), we should throw an error, right? Currently this may not be the case if we only WARN about getBytesOnDisk() < getVisibleLength(). It seems in that case we should then check getBlockDataLength() < getVisibleLength().

        Show
        xkrogen Erik Krogen added a comment - Hey Wei-Chiu Chuang , trying to make sure I understand this patch. IIUC we basically want to address the potential discrepancy between what is actually on disk and the in-memory idea of what is on disk ( getBytesOnDisk() ). The changes to FsDatasetImpl#recoverRbwImpl() seem reasonable and the test seems good. I'm less sure of the change to FsDatasetImpl#initReplicaRecoveryImpl() . If the actual number of bytes on disk is less than getVisibleLength() , we should throw an error, right? Currently this may not be the case if we only WARN about getBytesOnDisk() < getVisibleLength() . It seems in that case we should then check getBlockDataLength() < getVisibleLength() .
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        Updated the patch to address the comment.
        Also had a minor change in FsDatasetImpl#recoverRbwImpl to check if the rbw is a ReplicaBeingWritten. Skip truncation if it is not ReplicaBeingWritten.

        Show
        jojochuang Wei-Chiu Chuang added a comment - Updated the patch to address the comment. Also had a minor change in FsDatasetImpl#recoverRbwImpl to check if the rbw is a ReplicaBeingWritten. Skip truncation if it is not ReplicaBeingWritten.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 31s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        +1 mvninstall 13m 30s trunk passed
        +1 compile 0m 48s trunk passed
        +1 checkstyle 0m 36s trunk passed
        +1 mvnsite 0m 52s trunk passed
        +1 mvneclipse 0m 15s trunk passed
        +1 findbugs 1m 39s trunk passed
        +1 javadoc 0m 41s trunk passed
        +1 mvninstall 0m 48s the patch passed
        +1 compile 0m 45s the patch passed
        +1 javac 0m 45s the patch passed
        -0 checkstyle 0m 34s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 120 unchanged - 1 fixed = 122 total (was 121)
        +1 mvnsite 0m 49s the patch passed
        +1 mvneclipse 0m 12s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 45s the patch passed
        +1 javadoc 0m 37s the patch passed
        -1 unit 74m 44s hadoop-hdfs in the patch failed.
        -1 asflicense 0m 28s The patch generated 1 ASF License warnings.
        101m 3s



        Reason Tests
        Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
          hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy
          hadoop.hdfs.TestErasureCodeBenchmarkThroughput
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080
          hadoop.hdfs.server.balancer.TestBalancer
          hadoop.hdfs.server.datanode.TestDirectoryScanner



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HDFS-11472
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12870896/HDFS-11472.002.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 8fd774ea4781 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 7101477
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19735/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/19735/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19735/testReport/
        asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/19735/artifact/patchprocess/patch-asflicense-problems.txt
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19735/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 31s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 13m 30s trunk passed +1 compile 0m 48s trunk passed +1 checkstyle 0m 36s trunk passed +1 mvnsite 0m 52s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 39s trunk passed +1 javadoc 0m 41s trunk passed +1 mvninstall 0m 48s the patch passed +1 compile 0m 45s the patch passed +1 javac 0m 45s the patch passed -0 checkstyle 0m 34s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 120 unchanged - 1 fixed = 122 total (was 121) +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 45s the patch passed +1 javadoc 0m 37s the patch passed -1 unit 74m 44s hadoop-hdfs in the patch failed. -1 asflicense 0m 28s The patch generated 1 ASF License warnings. 101m 3s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy   hadoop.hdfs.TestErasureCodeBenchmarkThroughput   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080   hadoop.hdfs.server.balancer.TestBalancer   hadoop.hdfs.server.datanode.TestDirectoryScanner Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-11472 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12870896/HDFS-11472.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 8fd774ea4781 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 7101477 Default Java 1.8.0_131 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19735/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19735/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19735/testReport/ asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/19735/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19735/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        xkrogen Erik Krogen added a comment -

        Wei-Chiu Chuang in v002 patch I see your change to check for RBW but I don't see any changes related to my comment, did you forget to include?

        Show
        xkrogen Erik Krogen added a comment - Wei-Chiu Chuang in v002 patch I see your change to check for RBW but I don't see any changes related to my comment, did you forget to include?
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        Rev 003. I am really sorry about that. I thought I had removed that part. Attached a new patch to address the comment, for real.

        Show
        jojochuang Wei-Chiu Chuang added a comment - Rev 003. I am really sorry about that. I thought I had removed that part. Attached a new patch to address the comment, for real.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 15m 55s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        +1 mvninstall 13m 31s trunk passed
        +1 compile 0m 48s trunk passed
        +1 checkstyle 0m 36s trunk passed
        +1 mvnsite 0m 52s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        +1 findbugs 1m 37s trunk passed
        +1 javadoc 0m 40s trunk passed
        +1 mvninstall 0m 47s the patch passed
        +1 compile 0m 44s the patch passed
        +1 javac 0m 44s the patch passed
        -0 checkstyle 0m 34s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 120 unchanged - 1 fixed = 122 total (was 121)
        +1 mvnsite 0m 50s the patch passed
        +1 mvneclipse 0m 12s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 43s the patch passed
        +1 javadoc 0m 37s the patch passed
        -1 unit 66m 25s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 20s The patch does not generate ASF License warnings.
        107m 53s



        Reason Tests
        Failed junit tests hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HDFS-11472
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12871029/HDFS-11472.003.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux f98298ced575 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 73ecb19
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19755/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/19755/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19755/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19755/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 15m 55s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 13m 31s trunk passed +1 compile 0m 48s trunk passed +1 checkstyle 0m 36s trunk passed +1 mvnsite 0m 52s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 37s trunk passed +1 javadoc 0m 40s trunk passed +1 mvninstall 0m 47s the patch passed +1 compile 0m 44s the patch passed +1 javac 0m 44s the patch passed -0 checkstyle 0m 34s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 120 unchanged - 1 fixed = 122 total (was 121) +1 mvnsite 0m 50s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 43s the patch passed +1 javadoc 0m 37s the patch passed -1 unit 66m 25s hadoop-hdfs in the patch failed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 107m 53s Reason Tests Failed junit tests hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-11472 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12871029/HDFS-11472.003.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux f98298ced575 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 73ecb19 Default Java 1.8.0_131 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19755/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19755/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19755/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19755/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        xkrogen Erik Krogen added a comment -

        Wei-Chiu Chuang no problem! So I was actually wondering if, following the same reasoning as recoverRbwImpl, it may be better for initReplicaRecoveryImpl to check blockDataLength if bytesOnDisk is unexpected, something like this:

              //check replica bytes on disk.
              long bytesOnDisk = replica.getBytesOnDisk();
              if (bytesOnDisk < replica.getVisibleLength()) {
                long dataLength = replica.getBlockDataLength();
                if (bytesOnDisk != dataLength) {
                  LOG.warn("replica recovery: replica.getBytesOnDisk() = " +
                      replica.getBytesOnDisk() + " != " +
                      "replica.getBlockDataLength() = " + dataLength +
                      ", replica = " + replica);
                  rip.setLastChecksumAndDataLen(dataLength, null);
                }
                if (replica.getBytesOnDisk() < replica.getVisibleLength()) {
                  throw new IOException("THIS IS NOT SUPPOSED TO HAPPEN:"
                      + " getBytesOnDisk() < getVisibleLength(), rip=" + replica);
                }
              }
        

        Do you think this makes sense?

        Show
        xkrogen Erik Krogen added a comment - Wei-Chiu Chuang no problem! So I was actually wondering if, following the same reasoning as recoverRbwImpl , it may be better for initReplicaRecoveryImpl to check blockDataLength if bytesOnDisk is unexpected, something like this: //check replica bytes on disk. long bytesOnDisk = replica.getBytesOnDisk(); if (bytesOnDisk < replica.getVisibleLength()) { long dataLength = replica.getBlockDataLength(); if (bytesOnDisk != dataLength) { LOG.warn( "replica recovery: replica.getBytesOnDisk() = " + replica.getBytesOnDisk() + " != " + "replica.getBlockDataLength() = " + dataLength + ", replica = " + replica); rip.setLastChecksumAndDataLen(dataLength, null ); } if (replica.getBytesOnDisk() < replica.getVisibleLength()) { throw new IOException( "THIS IS NOT SUPPOSED TO HAPPEN:" + " getBytesOnDisk() < getVisibleLength(), rip=" + replica); } } Do you think this makes sense?
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        Hi Erik Krogen. Thanks for the comment.
        I think there's no harm adding the extra warning, as it might still be possible a similar error creeps into it even after this fix.

        I am not so sure about resetting the replica's in-memory last chunk checksum. After the recovery initiated by initReplicaRecoveryImpl, the block may be read, and if the LCC does not match the data in the last chunk, the reader would erroneously believe the block is corrupt, which defeats the purpose of the fix.

        The reason that recoverRbwImpl resets the replica's in-memory LCC to null, is that after the recovery, the block is immediately being written, so the LCC won't match the chunk data anyway (after the block is finalized, LCC is updated), and there is little benefit in making the LCC correct.

        Show
        jojochuang Wei-Chiu Chuang added a comment - Hi Erik Krogen . Thanks for the comment. I think there's no harm adding the extra warning, as it might still be possible a similar error creeps into it even after this fix. I am not so sure about resetting the replica's in-memory last chunk checksum. After the recovery initiated by initReplicaRecoveryImpl , the block may be read, and if the LCC does not match the data in the last chunk, the reader would erroneously believe the block is corrupt, which defeats the purpose of the fix. The reason that recoverRbwImpl resets the replica's in-memory LCC to null, is that after the recovery, the block is immediately being written, so the LCC won't match the chunk data anyway (after the block is finalized, LCC is updated), and there is little benefit in making the LCC correct.
        Hide
        xkrogen Erik Krogen added a comment -

        Wei-Chiu Chuang I see the issue, thank you for the explanation. But the original stack trace you posted is from initReplicaRecoveryImpl, right? So your current patch would not solve the issue? Should we instead reread the checksum from disk (assuming that if the length was not updated in memory due to exception, probably the checksum also was not) and then set it via rip.setLastChecksumAndDataLen()?

        Show
        xkrogen Erik Krogen added a comment - Wei-Chiu Chuang I see the issue, thank you for the explanation. But the original stack trace you posted is from initReplicaRecoveryImpl , right? So your current patch would not solve the issue? Should we instead reread the checksum from disk (assuming that if the length was not updated in memory due to exception, probably the checksum also was not) and then set it via rip.setLastChecksumAndDataLen() ?
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        Okay I see. Thanks Erik.
        The reason I added the fix in recoverRbwImpl rather than in initReplicaRecoveryImpl is because the former will happen before the latter.
        Right after the block receiver throws the exception and breaks data pipeline, client would try to recover the pipeline and continue to append the block. This is where recoverRbwImpl comes into play.
        The latter initReplicaRecoveryImpl comes into play when client tries to recover the lease (If HBase region server can't close a WAL file for whatever reason. In this case, the file has zero replica. It initiates a recover lease request, which in turn recovers block replica)

        But I agree with you. I feel it's okay to add extra check and recovery code just in case.

        Show
        jojochuang Wei-Chiu Chuang added a comment - Okay I see. Thanks Erik. The reason I added the fix in recoverRbwImpl rather than in initReplicaRecoveryImpl is because the former will happen before the latter. Right after the block receiver throws the exception and breaks data pipeline, client would try to recover the pipeline and continue to append the block. This is where recoverRbwImpl comes into play. The latter initReplicaRecoveryImpl comes into play when client tries to recover the lease (If HBase region server can't close a WAL file for whatever reason. In this case, the file has zero replica. It initiates a recover lease request, which in turn recovers block replica) But I agree with you. I feel it's okay to add extra check and recovery code just in case.
        Hide
        shv Konstantin Shvachko added a comment -

        Hey Wei-Chiu Chuang, good find. Looked at your patch.

        1. So what happens if numBytes == bytesAcked == bytesOnDisk, but blockFileLength < bytesAcked? If I understand correctly with your patch you will not truncate the replica file and then loose the bytes in access of blockFileLength.
        2. Do we even need complex conditions in recoverRbwImpl() calculating needTruncate? We can just truncateBlock(bytesAcked) whenever numBytes > bytesAcked, which will check the actual file size and throw if the size is wrong.
        3. Also when you throw exception it should be ReplicaNotFoundException rather than IOException.
        Show
        shv Konstantin Shvachko added a comment - Hey Wei-Chiu Chuang , good find. Looked at your patch. So what happens if numBytes == bytesAcked == bytesOnDisk , but blockFileLength < bytesAcked ? If I understand correctly with your patch you will not truncate the replica file and then loose the bytes in access of blockFileLength . Do we even need complex conditions in recoverRbwImpl() calculating needTruncate ? We can just truncateBlock(bytesAcked) whenever numBytes > bytesAcked , which will check the actual file size and throw if the size is wrong. Also when you throw exception it should be ReplicaNotFoundException rather than IOException .
        Hide
        xkrogen Erik Krogen added a comment -

        Hey Wei-Chiu Chuang, first off thank you for your last explanation; makes sense. Are you still actively working on this? We would really like to get this patch in for the 2.7.4 release. If you do not have time right now I can try to take it up to address the last outstanding issues. Let me know.

        Show
        xkrogen Erik Krogen added a comment - Hey Wei-Chiu Chuang , first off thank you for your last explanation; makes sense. Are you still actively working on this? We would really like to get this patch in for the 2.7.4 release. If you do not have time right now I can try to take it up to address the last outstanding issues. Let me know.
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        Hi Eric, Thanks for the offering.
        Sorry I couldn't turn around at this. Feel free to reassign and move it out of release blocker.
        Very sorry about that.

        Show
        jojochuang Wei-Chiu Chuang added a comment - Hi Eric, Thanks for the offering. Sorry I couldn't turn around at this. Feel free to reassign and move it out of release blocker. Very sorry about that.
        Hide
        xkrogen Erik Krogen added a comment -

        Hey Wei-Chiu Chuang, no problem, I'm happy to take over. It gave me good incentive to learn this portion of the code in more depth.

        I'm posting a v004 patch that addresses Konstantin Shvachko's comments and simplifies the logic a little by handling different portions of inconsistencies sequentially. I reused your test case.

        I also changed the message in the IOException thrown when getBytesOnDisk() < getVisibleLength() inside of FsDatasetImpl#initReplicaRecoveryImpl() to remove the "THIS IS NOT SUPPOSED TO HAPPEN". Given that we ack bytes before writing them to disk (unless sync was requested) inside of BlockReceiver#receivePacket(), the situation where getBytesOnDisk() < getVisibleLength() can occur as part of normal operation, so this error message was misleading.

        Show
        xkrogen Erik Krogen added a comment - Hey Wei-Chiu Chuang , no problem, I'm happy to take over. It gave me good incentive to learn this portion of the code in more depth. I'm posting a v004 patch that addresses Konstantin Shvachko 's comments and simplifies the logic a little by handling different portions of inconsistencies sequentially. I reused your test case. I also changed the message in the IOException thrown when getBytesOnDisk() < getVisibleLength() inside of FsDatasetImpl#initReplicaRecoveryImpl() to remove the "THIS IS NOT SUPPOSED TO HAPPEN". Given that we ack bytes before writing them to disk (unless sync was requested) inside of BlockReceiver#receivePacket() , the situation where getBytesOnDisk() < getVisibleLength() can occur as part of normal operation, so this error message was misleading.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 10s Docker mode activated.
              Prechecks
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
              trunk Compile Tests
        +1 mvninstall 13m 18s trunk passed
        +1 compile 0m 49s trunk passed
        +1 checkstyle 0m 36s trunk passed
        +1 mvnsite 0m 57s trunk passed
        -1 findbugs 1m 53s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings.
        +1 javadoc 0m 45s trunk passed
              Patch Compile Tests
        +1 mvninstall 0m 55s the patch passed
        +1 compile 0m 54s the patch passed
        +1 javac 0m 54s the patch passed
        -0 checkstyle 0m 35s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 109 unchanged - 1 fixed = 110 total (was 110)
        +1 mvnsite 0m 59s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 52s the patch passed
        +1 javadoc 0m 37s the patch passed
              Other Tests
        +1 unit 66m 58s hadoop-hdfs in the patch passed.
        +1 asflicense 0m 19s The patch does not generate ASF License warnings.
        92m 54s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HDFS-11472
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877899/HDFS-11472.004.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 08c5d18465ee 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 5aa2bf2
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20331/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/20331/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20331/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20331/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 10s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.       trunk Compile Tests +1 mvninstall 13m 18s trunk passed +1 compile 0m 49s trunk passed +1 checkstyle 0m 36s trunk passed +1 mvnsite 0m 57s trunk passed -1 findbugs 1m 53s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. +1 javadoc 0m 45s trunk passed       Patch Compile Tests +1 mvninstall 0m 55s the patch passed +1 compile 0m 54s the patch passed +1 javac 0m 54s the patch passed -0 checkstyle 0m 35s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 109 unchanged - 1 fixed = 110 total (was 110) +1 mvnsite 0m 59s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 52s the patch passed +1 javadoc 0m 37s the patch passed       Other Tests +1 unit 66m 58s hadoop-hdfs in the patch passed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 92m 54s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-11472 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877899/HDFS-11472.004.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 08c5d18465ee 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 5aa2bf2 Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20331/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/20331/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20331/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20331/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        shv Konstantin Shvachko added a comment -

        Erik Krogen this should work correctly. I would change one condition:

        -        if (numBytes > bytesAcked) {
        +        if (bytesOnDisk > bytesAcked) {
        

        because during physical recovery we do not care about in-memory numBytes, but rather the actual block file length = bytesOnDisk.
        Also would be good if you could add a JavaDoc comment, explaining what it tests, as an example of good coding style.

        Show
        shv Konstantin Shvachko added a comment - Erik Krogen this should work correctly. I would change one condition: - if (numBytes > bytesAcked) { + if (bytesOnDisk > bytesAcked) { because during physical recovery we do not care about in-memory numBytes , but rather the actual block file length = bytesOnDisk . Also would be good if you could add a JavaDoc comment, explaining what it tests, as an example of good coding style.
        Hide
        xkrogen Erik Krogen added a comment -

        Thanks for the review Konstantin Shvachko. Agreed with you on the s/numBytes/bytesOnDisk/ change. I added a Javadoc comment to the test as well. Attached v005 patch.

        Show
        xkrogen Erik Krogen added a comment - Thanks for the review Konstantin Shvachko . Agreed with you on the s/numBytes/bytesOnDisk/ change. I added a Javadoc comment to the test as well. Attached v005 patch.
        Hide
        shv Konstantin Shvachko added a comment -

        +1 on the latest patch.
        Patches for other branches needed as well. It is not applying cleanly even to branch-2.

        Show
        shv Konstantin Shvachko added a comment - +1 on the latest patch. Patches for other branches needed as well. It is not applying cleanly even to branch-2.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 12s Docker mode activated.
              Prechecks
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
              trunk Compile Tests
        +1 mvninstall 14m 4s trunk passed
        +1 compile 0m 49s trunk passed
        +1 checkstyle 0m 36s trunk passed
        +1 mvnsite 0m 54s trunk passed
        -1 findbugs 1m 40s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings.
        +1 javadoc 0m 40s trunk passed
              Patch Compile Tests
        +1 mvninstall 0m 51s the patch passed
        +1 compile 0m 59s the patch passed
        +1 javac 0m 59s the patch passed
        -0 checkstyle 0m 34s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 109 unchanged - 1 fixed = 110 total (was 110)
        +1 mvnsite 1m 1s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 47s the patch passed
        +1 javadoc 0m 38s the patch passed
              Other Tests
        -1 unit 70m 20s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 19s The patch does not generate ASF License warnings.
        96m 40s



        Reason Tests
        Failed junit tests hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160
          hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HDFS-11472
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12878064/HDFS-11472.005.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 406f8bb6f1f5 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / c21c260
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20342/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/20342/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/20342/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20342/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20342/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.       trunk Compile Tests +1 mvninstall 14m 4s trunk passed +1 compile 0m 49s trunk passed +1 checkstyle 0m 36s trunk passed +1 mvnsite 0m 54s trunk passed -1 findbugs 1m 40s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. +1 javadoc 0m 40s trunk passed       Patch Compile Tests +1 mvninstall 0m 51s the patch passed +1 compile 0m 59s the patch passed +1 javac 0m 59s the patch passed -0 checkstyle 0m 34s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 109 unchanged - 1 fixed = 110 total (was 110) +1 mvnsite 1m 1s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 47s the patch passed +1 javadoc 0m 38s the patch passed       Other Tests -1 unit 70m 20s hadoop-hdfs in the patch failed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 96m 40s Reason Tests Failed junit tests hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160   hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-11472 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12878064/HDFS-11472.005.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 406f8bb6f1f5 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / c21c260 Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/20342/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/20342/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/20342/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/20342/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/20342/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        xkrogen Erik Krogen added a comment -

        Attaching patches for branch-2, branch-2.8, branch-2.7. They all had conflicts due to very minor changes near the block truncation between each version.

        Looking at Jenkins, test failures do not seem related; I am unable to reproduce.

        Show
        xkrogen Erik Krogen added a comment - Attaching patches for branch-2, branch-2.8, branch-2.7. They all had conflicts due to very minor changes near the block truncation between each version. Looking at Jenkins, test failures do not seem related; I am unable to reproduce.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12040 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12040/)
        HDFS-11472. Fix inconsistent replica size after a data pipeline failure. (shv: rev 8c2c8128328d465ec9699d0573bef69019742512)

        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestWriteToReplica.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImplTestUtils.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12040 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12040/ ) HDFS-11472 . Fix inconsistent replica size after a data pipeline failure. (shv: rev 8c2c8128328d465ec9699d0573bef69019742512) (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestWriteToReplica.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImplTestUtils.java
        Hide
        shv Konstantin Shvachko added a comment -

        I just committed this to trunks and branches 2, 2.8, 2.7.
        Thank you Erik Krogen and Wei-Chiu Chuang.

        Show
        shv Konstantin Shvachko added a comment - I just committed this to trunks and branches 2, 2.8, 2.7. Thank you Erik Krogen and Wei-Chiu Chuang .

          People

          • Assignee:
            xkrogen Erik Krogen
            Reporter:
            jojochuang Wei-Chiu Chuang
          • Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development