Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10960

TestDataNodeHotSwapVolumes#testRemoveVolumeBeingWritten fails at disk error verification after volume remove

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha2
    • Fix Version/s: 2.8.0, 3.0.0-alpha2
    • Component/s: hdfs
    • Labels:
      None

      Description

      TestDataNodeHotSwapVolumes#testRemoveVolumeBeingWritten fails occasionally in the following verification.

        700     // If an IOException thrown from BlockReceiver#run, it triggers
        701     // DataNode#checkDiskError(). So we can test whether checkDiskError() is called,
        702     // to see whether there is IOException in BlockReceiver#run().
        703     assertEquals(lastTimeDiskErrorCheck, dn.getLastDiskErrorCheck());
        704 
      
      Error Message
      
      expected:<0> but was:<6498109>
      Stacktrace
      
      java.lang.AssertionError: expected:<0> but was:<6498109>
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.junit.Assert.failNotEquals(Assert.java:743)
      	at org.junit.Assert.assertEquals(Assert.java:118)
      	at org.junit.Assert.assertEquals(Assert.java:555)
      	at org.junit.Assert.assertEquals(Assert.java:542)
      	at org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWrittenForDatanode(TestDataNodeHotSwapVolumes.java:703)
      	at org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten(TestDataNodeHotSwapVolumes.java:620)
      
      
      1. HDFS-10960.01.patch
        2 kB
        Manoj Govindassamy
      2. HDFS-10960.02.patch
        2 kB
        Manoj Govindassamy

        Activity

        Hide
        eddyxu Lei (Eddy) Xu added a comment -

        Re-worked to commit 01 patch to branch-2 and branch-2.8.

        Thanks Kihwal Lee and Manoj Govindassamy for working closely on this patch.

        Show
        eddyxu Lei (Eddy) Xu added a comment - Re-worked to commit 01 patch to branch-2 and branch-2.8. Thanks Kihwal Lee and Manoj Govindassamy for working closely on this patch.
        Hide
        manojg Manoj Govindassamy added a comment -

        Tested 01 patch on both branch2 and branch2.8 and they are build good and test passes through.

        Show
        manojg Manoj Govindassamy added a comment - Tested 01 patch on both branch2 and branch2.8 and they are build good and test passes through.
        Hide
        manojg Manoj Govindassamy added a comment -

        Lei (Eddy) Xu. v01 patch should work there as it uses getBasePath() instead of getBaseURI().

        Show
        manojg Manoj Govindassamy added a comment - Lei (Eddy) Xu . v01 patch should work there as it uses getBasePath() instead of getBaseURI().
        Hide
        eddyxu Lei (Eddy) Xu added a comment -

        hi, Kihwal Lee

        Thanks for reporting it. Working on fixing it now.

        Show
        eddyxu Lei (Eddy) Xu added a comment - hi, Kihwal Lee Thanks for reporting it. Working on fixing it now.
        Hide
        kihwal Kihwal Lee added a comment -

        Reverted from branch-2 and branch-2.8. Please rework the patch for these branches.

        Show
        kihwal Kihwal Lee added a comment - Reverted from branch-2 and branch-2.8. Please rework the patch for these branches.
        Hide
        kihwal Kihwal Lee added a comment -

        branch-2 build fails.

        [ERROR] /home1/kihwal/devel/apache/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeHotSwapVolumes.java:[710,38] cannot find symbol
        [ERROR] symbol:   method getBaseURI()
        [ERROR] location: interface org.apache.hadoop.hdfs.server.datanode.fsdataset.FsVolumeSpi
        
        Show
        kihwal Kihwal Lee added a comment - branch-2 build fails. [ERROR] /home1/kihwal/devel/apache/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeHotSwapVolumes.java:[710,38] cannot find symbol [ERROR] symbol: method getBaseURI() [ERROR] location: interface org.apache.hadoop.hdfs.server.datanode.fsdataset.FsVolumeSpi
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10615 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10615/)
        HDFS-10960. TestDataNodeHotSwapVolumes#testRemoveVolumeBeingWritten (lei: rev 8c520a27cbd9daba05367d3a83017a2eab5258eb)

        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeHotSwapVolumes.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10615 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10615/ ) HDFS-10960 . TestDataNodeHotSwapVolumes#testRemoveVolumeBeingWritten (lei: rev 8c520a27cbd9daba05367d3a83017a2eab5258eb) (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeHotSwapVolumes.java
        Hide
        eddyxu Lei (Eddy) Xu added a comment -

        +1 for the latest patch.

        Committed to trunk, branch-2 and 2.8

        Thanks, Manoj Govindassamy!

        Show
        eddyxu Lei (Eddy) Xu added a comment - +1 for the latest patch. Committed to trunk, branch-2 and 2.8 Thanks, Manoj Govindassamy !
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 15s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 8m 4s trunk passed
        +1 compile 0m 59s trunk passed
        +1 checkstyle 0m 30s trunk passed
        +1 mvnsite 1m 8s trunk passed
        +1 mvneclipse 0m 13s trunk passed
        +1 findbugs 2m 1s trunk passed
        +1 javadoc 0m 44s trunk passed
        +1 mvninstall 0m 45s the patch passed
        +1 compile 0m 43s the patch passed
        +1 javac 0m 43s the patch passed
        +1 checkstyle 0m 23s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 14 unchanged - 1 fixed = 14 total (was 15)
        +1 mvnsite 0m 49s the patch passed
        +1 mvneclipse 0m 9s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 51s the patch passed
        +1 javadoc 0m 37s the patch passed
        +1 unit 60m 36s hadoop-hdfs in the patch passed.
        +1 asflicense 0m 19s The patch does not generate ASF License warnings.
        81m 18s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Issue HDFS-10960
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833422/HDFS-10960.02.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 3f5d823996bd 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 701c27a
        Default Java 1.8.0_101
        findbugs v3.0.0
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17163/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17163/console
        Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 8m 4s trunk passed +1 compile 0m 59s trunk passed +1 checkstyle 0m 30s trunk passed +1 mvnsite 1m 8s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 2m 1s trunk passed +1 javadoc 0m 44s trunk passed +1 mvninstall 0m 45s the patch passed +1 compile 0m 43s the patch passed +1 javac 0m 43s the patch passed +1 checkstyle 0m 23s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 14 unchanged - 1 fixed = 14 total (was 15) +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 9s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 51s the patch passed +1 javadoc 0m 37s the patch passed +1 unit 60m 36s hadoop-hdfs in the patch passed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 81m 18s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10960 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833422/HDFS-10960.02.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 3f5d823996bd 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 701c27a Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17163/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17163/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        manojg Manoj Govindassamy added a comment -

        Thanks for the review Lei (Eddy) Xu. Attached v02 patch with latest trunk rebase.

        Show
        manojg Manoj Govindassamy added a comment - Thanks for the review Lei (Eddy) Xu . Attached v02 patch with latest trunk rebase.
        Hide
        eddyxu Lei (Eddy) Xu added a comment -

        Hi, Manoj Govindassamy
        The patch fails to build on the newest trunk. You might need to rebase it.

        Beside that, +1 pending. It is a nice fix. Thanks!

        Show
        eddyxu Lei (Eddy) Xu added a comment - Hi, Manoj Govindassamy The patch fails to build on the newest trunk. You might need to rebase it. Beside that, +1 pending. It is a nice fix. Thanks!
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 14s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 7m 13s trunk passed
        +1 compile 0m 47s trunk passed
        +1 checkstyle 0m 26s trunk passed
        +1 mvnsite 0m 54s trunk passed
        +1 mvneclipse 0m 12s trunk passed
        +1 findbugs 1m 43s trunk passed
        +1 javadoc 0m 55s trunk passed
        +1 mvninstall 0m 53s the patch passed
        +1 compile 0m 44s the patch passed
        +1 javac 0m 44s the patch passed
        +1 checkstyle 0m 23s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 14 unchanged - 1 fixed = 14 total (was 15)
        +1 mvnsite 0m 48s the patch passed
        +1 mvneclipse 0m 10s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 47s the patch passed
        +1 javadoc 0m 52s the patch passed
        +1 unit 60m 57s hadoop-hdfs in the patch passed.
        +1 asflicense 0m 19s The patch does not generate ASF License warnings.
        80m 33s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Issue HDFS-10960
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12831625/HDFS-10960.01.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 04c963d09245 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 44f48ee
        Default Java 1.8.0_101
        findbugs v3.0.0
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17008/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17008/console
        Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 13s trunk passed +1 compile 0m 47s trunk passed +1 checkstyle 0m 26s trunk passed +1 mvnsite 0m 54s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 43s trunk passed +1 javadoc 0m 55s trunk passed +1 mvninstall 0m 53s the patch passed +1 compile 0m 44s the patch passed +1 javac 0m 44s the patch passed +1 checkstyle 0m 23s hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 14 unchanged - 1 fixed = 14 total (was 15) +1 mvnsite 0m 48s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 47s the patch passed +1 javadoc 0m 52s the patch passed +1 unit 60m 57s hadoop-hdfs in the patch passed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 80m 33s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10960 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12831625/HDFS-10960.01.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 04c963d09245 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 44f48ee Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17008/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17008/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        manojg Manoj Govindassamy added a comment -

        Attached v01 patch which addresses the problem as proposed in the previous comment. Lei (Eddy) Xu, can you please review the patch ?

        Show
        manojg Manoj Govindassamy added a comment - Attached v01 patch which addresses the problem as proposed in the previous comment. Lei (Eddy) Xu , can you please review the patch ?
        Hide
        manojg Manoj Govindassamy added a comment -

        Looking at the code, remove volumes at DataNode can potentially interrupt BlockReceiver and if the BlockReceiver happens to be in some IO operations like flushing or setting channel position for the new checksum then it can throw IOException. BlockReceiver on getting IOexception, starts a thread to check for disk errors.

        TestDataNodeHotSwapVolumes#testRemoveVolumeBeingWritten verification fails if the DataNode ever started a disk error check thread. This verification doesn't seem to be fruitful as we already have another verification for checking the block replication factor. So, the proposal here is to replace this not so useful verification with another verification to check for if the disk removal happened successfully and if the replication factor of the block caught up even after the volume removal.

        Show
        manojg Manoj Govindassamy added a comment - Looking at the code, remove volumes at DataNode can potentially interrupt BlockReceiver and if the BlockReceiver happens to be in some IO operations like flushing or setting channel position for the new checksum then it can throw IOException. BlockReceiver on getting IOexception, starts a thread to check for disk errors. TestDataNodeHotSwapVolumes#testRemoveVolumeBeingWritten verification fails if the DataNode ever started a disk error check thread. This verification doesn't seem to be fruitful as we already have another verification for checking the block replication factor. So, the proposal here is to replace this not so useful verification with another verification to check for if the disk removal happened successfully and if the replication factor of the block caught up even after the volume removal.

          People

          • Assignee:
            manojg Manoj Govindassamy
            Reporter:
            manojg Manoj Govindassamy
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development