Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10830

FsDatasetImpl#removeVolumes crashes with IllegalMonitorStateException when vol being removed is in use

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha1
    • Fix Version/s: 2.8.0, 3.0.0-alpha2
    • Component/s: hdfs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      FsDatasetImpl#removeVolumes() operation crashes abruptly with IllegalMonitorStateException whenever the volume being removed is in use concurrently.

      Looks like removeVolumes() is waiting on a monitor object "this" (that is FsDatasetImpl) which it has never locked, leading to IllegalMonitorStateException. This monitor wait happens only the volume being removed is in use (referencecount > 0). The thread performing this remove volume operation thus crashes abruptly and block invalidations for the remove volumes are totally skipped.

      FsDatasetImpl.java
      @Override
      public void removeVolumes(Set<File> volumesToRemove, boolean clearFailure) {
      ..
      ..
      try (AutoCloseableLock lock = datasetLock.acquire()) {   <== LOCK acquire datasetLock
      for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) {
        .. .. ..
        asyncDiskService.removeVolume(sd.getCurrentDir());     <== volume SD1 remove
        volumes.removeVolume(absRoot, clearFailure);
        volumes.waitVolumeRemoved(5000, this);                 <== WAIT on "this" ?? But, we haven't locked it yet.
                                                                   This will cause IllegalMonitorStateException
                                                                   and crash getBlockReports()/FBR thread!
      
        for (String bpid : volumeMap.getBlockPoolList()) {
          List<ReplicaInfo> blocks = new ArrayList<>();
          for (Iterator<ReplicaInfo> it = volumeMap.replicas(bpid).iterator();
               it.hasNext(); ) {
              .. .. .. 
              it.remove();                                     <== volumeMap removal
            }
          blkToInvalidate.put(bpid, blocks);
        }
       .. ..
      }                                                        <== LOCK release datasetLock   
      
      // Call this outside the lock.
      for (Map.Entry<String, List<ReplicaInfo>> entry :
      blkToInvalidate.entrySet()) {
       ..
       for (ReplicaInfo block : blocks) {
        invalidate(bpid, block);                               <== Notify NN of Block removal
       }
      }
      
      1. HDFS-10830.01.patch
        6 kB
        Arpit Agarwal
      2. HDFS-10830.02.patch
        6 kB
        Arpit Agarwal
      3. HDFS-10830.05.patch
        8 kB
        Arpit Agarwal
      4. HDFS-10830.06.patch
        8 kB
        Arpit Agarwal

        Issue Links

          Activity

          Hide
          manojg Manoj Govindassamy added a comment -

          Thanks Arpit Agarwal.

          Show
          manojg Manoj Govindassamy added a comment - Thanks Arpit Agarwal .
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10423 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10423/)
          HDFS-10830. FsDatasetImpl#removeVolumes crashes with (arp: rev a99bf26a0899bcc4307c3a242c8414eaef555aa7)

          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java
          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/AutoCloseableLock.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10423 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10423/ ) HDFS-10830 . FsDatasetImpl#removeVolumes crashes with (arp: rev a99bf26a0899bcc4307c3a242c8414eaef555aa7) (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/AutoCloseableLock.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Thanks Xiao Chen and Manoj Govindassamy. I committed this for 2.8.0.

          Manoj I also credited you for the patch.

          Show
          arpitagarwal Arpit Agarwal added a comment - Thanks Xiao Chen and Manoj Govindassamy . I committed this for 2.8.0. Manoj I also credited you for the patch.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Arpit, patch 6 LGTM +1.
          Failed test is BindException and unrelated.

          Show
          xiaochen Xiao Chen added a comment - Thanks Arpit, patch 6 LGTM +1. Failed test is BindException and unrelated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 18s Maven dependency ordering for branch
          +1 mvninstall 6m 55s trunk passed
          +1 compile 7m 5s trunk passed
          +1 checkstyle 1m 28s trunk passed
          +1 mvnsite 1m 52s trunk passed
          +1 mvneclipse 0m 27s trunk passed
          +1 findbugs 3m 30s trunk passed
          +1 javadoc 1m 47s trunk passed
          0 mvndep 0m 14s Maven dependency ordering for patch
          +1 mvninstall 1m 29s the patch passed
          +1 compile 7m 16s the patch passed
          +1 javac 7m 16s the patch passed
          +1 checkstyle 1m 30s the patch passed
          +1 mvnsite 1m 50s the patch passed
          +1 mvneclipse 0m 27s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 3m 46s the patch passed
          +1 javadoc 1m 47s the patch passed
          +1 unit 8m 31s hadoop-common in the patch passed.
          -1 unit 81m 40s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 25s The patch does not generate ASF License warnings.
          133m 28s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped
          Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10830
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12827879/HDFS-10830.06.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 6a82f7ee0dda 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / bee9f57
          Default Java 1.8.0_101
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16704/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16704/testReport/
          modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16704/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 18s Maven dependency ordering for branch +1 mvninstall 6m 55s trunk passed +1 compile 7m 5s trunk passed +1 checkstyle 1m 28s trunk passed +1 mvnsite 1m 52s trunk passed +1 mvneclipse 0m 27s trunk passed +1 findbugs 3m 30s trunk passed +1 javadoc 1m 47s trunk passed 0 mvndep 0m 14s Maven dependency ordering for patch +1 mvninstall 1m 29s the patch passed +1 compile 7m 16s the patch passed +1 javac 7m 16s the patch passed +1 checkstyle 1m 30s the patch passed +1 mvnsite 1m 50s the patch passed +1 mvneclipse 0m 27s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 3m 46s the patch passed +1 javadoc 1m 47s the patch passed +1 unit 8m 31s hadoop-common in the patch passed. -1 unit 81m 40s hadoop-hdfs in the patch failed. +1 asflicense 0m 25s The patch does not generate ASF License warnings. 133m 28s Reason Tests Failed junit tests hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2 Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10830 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12827879/HDFS-10830.06.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 6a82f7ee0dda 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / bee9f57 Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16704/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16704/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16704/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Looks good to me. Failed tests in patch 5's pre-commit look unrelated.

          +1 pending:

          • In the test, can we log the exception here? (i.e. s/+/,/g)
            LOG.info("Problem preparing volumes to remove: " + e);
          • Please fix the checkstyle and javadoc warnings.

          Thanks for the contribution, Manoj and Arpit.

          Show
          xiaochen Xiao Chen added a comment - Looks good to me. Failed tests in patch 5's pre-commit look unrelated. +1 pending: In the test, can we log the exception here? (i.e. s/+/,/g) LOG.info("Problem preparing volumes to remove: " + e); Please fix the checkstyle and javadoc warnings. Thanks for the contribution, Manoj and Arpit.
          Hide
          linyiqun Yiqun Lin added a comment -

          Just say a little, hope not to disturb you:

          Hopefully the just previous failed test results were from the old patch and not from v05 patch.

          One of failed test in this JIRA TestPendingInvalidateBlock was tracked by HDFS-10426. I see this test have failed intermittently for a long time.

          Show
          linyiqun Yiqun Lin added a comment - Just say a little, hope not to disturb you: Hopefully the just previous failed test results were from the old patch and not from v05 patch. One of failed test in this JIRA TestPendingInvalidateBlock was tracked by HDFS-10426 . I see this test have failed intermittently for a long time.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 15s Maven dependency ordering for branch
          +1 mvninstall 7m 40s trunk passed
          +1 compile 7m 40s trunk passed
          +1 checkstyle 1m 29s trunk passed
          +1 mvnsite 2m 0s trunk passed
          +1 mvneclipse 0m 27s trunk passed
          +1 findbugs 3m 26s trunk passed
          +1 javadoc 1m 45s trunk passed
          0 mvndep 0m 14s Maven dependency ordering for patch
          +1 mvninstall 1m 31s the patch passed
          +1 compile 7m 12s the patch passed
          +1 javac 7m 12s the patch passed
          -0 checkstyle 1m 25s root: The patch generated 1 new + 159 unchanged - 0 fixed = 160 total (was 159)
          +1 mvnsite 1m 47s the patch passed
          +1 mvneclipse 0m 25s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 3m 24s the patch passed
          -1 javadoc 0m 49s hadoop-common-project_hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
          +1 unit 8m 16s hadoop-common in the patch passed.
          -1 unit 58m 50s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 23s The patch does not generate ASF License warnings.
          111m 4s



          Reason Tests
          Failed junit tests hadoop.hdfs.TestDecommissionWithStriped



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10830
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12827857/HDFS-10830.05.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 9f987e1290b8 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / bee9f57
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16703/artifact/patchprocess/diff-checkstyle-root.txt
          javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/16703/artifact/patchprocess/diff-javadoc-javadoc-hadoop-common-project_hadoop-common.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16703/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16703/testReport/
          modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16703/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 15s Maven dependency ordering for branch +1 mvninstall 7m 40s trunk passed +1 compile 7m 40s trunk passed +1 checkstyle 1m 29s trunk passed +1 mvnsite 2m 0s trunk passed +1 mvneclipse 0m 27s trunk passed +1 findbugs 3m 26s trunk passed +1 javadoc 1m 45s trunk passed 0 mvndep 0m 14s Maven dependency ordering for patch +1 mvninstall 1m 31s the patch passed +1 compile 7m 12s the patch passed +1 javac 7m 12s the patch passed -0 checkstyle 1m 25s root: The patch generated 1 new + 159 unchanged - 0 fixed = 160 total (was 159) +1 mvnsite 1m 47s the patch passed +1 mvneclipse 0m 25s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 3m 24s the patch passed -1 javadoc 0m 49s hadoop-common-project_hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) +1 unit 8m 16s hadoop-common in the patch passed. -1 unit 58m 50s hadoop-hdfs in the patch failed. +1 asflicense 0m 23s The patch does not generate ASF License warnings. 111m 4s Reason Tests Failed junit tests hadoop.hdfs.TestDecommissionWithStriped Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10830 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12827857/HDFS-10830.05.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 9f987e1290b8 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / bee9f57 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16703/artifact/patchprocess/diff-checkstyle-root.txt javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/16703/artifact/patchprocess/diff-javadoc-javadoc-hadoop-common-project_hadoop-common.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/16703/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16703/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16703/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Thank you both for working on this!
          I can take a look later today, if no other binding +1 beats me.

          Show
          xiaochen Xiao Chen added a comment - Thank you both for working on this! I can take a look later today, if no other binding +1 beats me.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Correct. The Jenkins result contains the patch file name that it ran against.

          Show
          arpitagarwal Arpit Agarwal added a comment - Correct. The Jenkins result contains the patch file name that it ran against.
          Hide
          manojg Manoj Govindassamy added a comment -

          Hopefully the just previous failed test results were from the old patch and not from v05 patch. Please confirm Arpit Agarwal.

          Show
          manojg Manoj Govindassamy added a comment - Hopefully the just previous failed test results were from the old patch and not from v05 patch. Please confirm Arpit Agarwal .
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Thank you Manoj Govindassamy.

          Xiao Chen, can you take a look at the patch (need a binding +1)? Thanks!

          Show
          arpitagarwal Arpit Agarwal added a comment - Thank you Manoj Govindassamy . Xiao Chen , can you take a look at the patch (need a binding +1)? Thanks!
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 5 new or modified test files.
          +1 mvninstall 8m 14s trunk passed
          +1 compile 0m 49s trunk passed
          +1 checkstyle 0m 29s trunk passed
          +1 mvnsite 0m 56s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 55s trunk passed
          +1 javadoc 1m 2s trunk passed
          +1 mvninstall 0m 51s the patch passed
          +1 compile 0m 46s the patch passed
          +1 javac 0m 46s the patch passed
          +1 checkstyle 0m 27s the patch passed
          +1 mvnsite 0m 51s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 58s the patch passed
          +1 javadoc 0m 59s the patch passed
          -1 unit 64m 21s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 19s The patch does not generate ASF License warnings.
          86m 4s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock
            hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10830
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12827849/HDFS-10830.04.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 1d2dc3b09e71 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / bee9f57
          Default Java 1.8.0_101
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16701/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16701/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16701/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 5 new or modified test files. +1 mvninstall 8m 14s trunk passed +1 compile 0m 49s trunk passed +1 checkstyle 0m 29s trunk passed +1 mvnsite 0m 56s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 55s trunk passed +1 javadoc 1m 2s trunk passed +1 mvninstall 0m 51s the patch passed +1 compile 0m 46s the patch passed +1 javac 0m 46s the patch passed +1 checkstyle 0m 27s the patch passed +1 mvnsite 0m 51s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 58s the patch passed +1 javadoc 0m 59s the patch passed -1 unit 64m 21s hadoop-hdfs in the patch failed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 86m 4s Reason Tests Failed junit tests hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock   hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10830 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12827849/HDFS-10830.04.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 1d2dc3b09e71 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / bee9f57 Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16701/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16701/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16701/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          manojg Manoj Govindassamy added a comment -

          v05 patch looks good to me. Test also passes through without any IllegalMonitorStateExceptions. Thanks Arpit Agarwal.

          Show
          manojg Manoj Govindassamy added a comment - v05 patch looks good to me. Test also passes through without any IllegalMonitorStateExceptions. Thanks Arpit Agarwal .
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 7m 17s trunk passed
          +1 compile 0m 48s trunk passed
          +1 checkstyle 0m 30s trunk passed
          +1 mvnsite 0m 52s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 45s trunk passed
          +1 javadoc 0m 57s trunk passed
          +1 mvninstall 0m 48s the patch passed
          +1 compile 0m 44s the patch passed
          +1 javac 0m 44s the patch passed
          +1 checkstyle 0m 26s the patch passed
          +1 mvnsite 0m 49s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 52s the patch passed
          +1 javadoc 0m 53s the patch passed
          +1 unit 58m 9s hadoop-hdfs in the patch passed.
          +1 asflicense 0m 20s The patch does not generate ASF License warnings.
          78m 0s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10830
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12827843/HDFS-10830.03.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 8a2e1a257af6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / cba973f
          Default Java 1.8.0_101
          findbugs v3.0.0
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16695/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16695/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 7m 17s trunk passed +1 compile 0m 48s trunk passed +1 checkstyle 0m 30s trunk passed +1 mvnsite 0m 52s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 45s trunk passed +1 javadoc 0m 57s trunk passed +1 mvninstall 0m 48s the patch passed +1 compile 0m 44s the patch passed +1 javac 0m 44s the patch passed +1 checkstyle 0m 26s the patch passed +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 52s the patch passed +1 javadoc 0m 53s the patch passed +1 unit 58m 9s hadoop-hdfs in the patch passed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 78m 0s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10830 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12827843/HDFS-10830.03.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 8a2e1a257af6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / cba973f Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16695/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16695/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          v05: Let's try this once more. I tried to keep the test changes close to what you have in your latest branch-2 patch for HDFS-9781.

          Show
          arpitagarwal Arpit Agarwal added a comment - v05: Let's try this once more. I tried to keep the test changes close to what you have in your latest branch-2 patch for HDFS-9781 .
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 11s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          0 mvndep 0m 13s Maven dependency ordering for branch
          +1 mvninstall 6m 35s trunk passed
          +1 compile 6m 48s trunk passed
          +1 checkstyle 1m 26s trunk passed
          +1 mvnsite 1m 45s trunk passed
          +1 mvneclipse 0m 24s trunk passed
          +1 findbugs 2m 57s trunk passed
          +1 javadoc 1m 39s trunk passed
          0 mvndep 0m 13s Maven dependency ordering for patch
          +1 mvninstall 1m 22s the patch passed
          +1 compile 6m 46s the patch passed
          +1 javac 6m 46s the patch passed
          -0 checkstyle 1m 26s root: The patch generated 1 new + 114 unchanged - 0 fixed = 115 total (was 114)
          +1 mvnsite 1m 42s the patch passed
          +1 mvneclipse 0m 25s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 3m 13s the patch passed
          -1 javadoc 0m 44s hadoop-common-project_hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
          -1 unit 16m 26s hadoop-common in the patch failed.
          +1 unit 56m 43s hadoop-hdfs in the patch passed.
          +1 asflicense 0m 23s The patch does not generate ASF License warnings.
          113m 4s



          Reason Tests
          Timed out junit tests org.apache.hadoop.http.TestHttpServerLifecycle



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10830
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12827827/HDFS-10830.01.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 77e1c02747fb 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / cba973f
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16690/artifact/patchprocess/diff-checkstyle-root.txt
          javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/16690/artifact/patchprocess/diff-javadoc-javadoc-hadoop-common-project_hadoop-common.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16690/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16690/testReport/
          modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16690/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 11s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. 0 mvndep 0m 13s Maven dependency ordering for branch +1 mvninstall 6m 35s trunk passed +1 compile 6m 48s trunk passed +1 checkstyle 1m 26s trunk passed +1 mvnsite 1m 45s trunk passed +1 mvneclipse 0m 24s trunk passed +1 findbugs 2m 57s trunk passed +1 javadoc 1m 39s trunk passed 0 mvndep 0m 13s Maven dependency ordering for patch +1 mvninstall 1m 22s the patch passed +1 compile 6m 46s the patch passed +1 javac 6m 46s the patch passed -0 checkstyle 1m 26s root: The patch generated 1 new + 114 unchanged - 0 fixed = 115 total (was 114) +1 mvnsite 1m 42s the patch passed +1 mvneclipse 0m 25s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 3m 13s the patch passed -1 javadoc 0m 44s hadoop-common-project_hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) -1 unit 16m 26s hadoop-common in the patch failed. +1 unit 56m 43s hadoop-hdfs in the patch passed. +1 asflicense 0m 23s The patch does not generate ASF License warnings. 113m 4s Reason Tests Timed out junit tests org.apache.hadoop.http.TestHttpServerLifecycle Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10830 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12827827/HDFS-10830.01.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 77e1c02747fb 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / cba973f Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16690/artifact/patchprocess/diff-checkstyle-root.txt javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/16690/artifact/patchprocess/diff-javadoc-javadoc-hadoop-common-project_hadoop-common.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/16690/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16690/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16690/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Can you please post an updated test patch for trunk that just has the correct test changes so we are in sync?

          Show
          arpitagarwal Arpit Agarwal added a comment - Can you please post an updated test patch for trunk that just has the correct test changes so we are in sync?
          Hide
          manojg Manoj Govindassamy added a comment -

          Arpit Agarwal,

          • But the base changeset in v04 patch looks like the one for ReplicaMap lock improvements HDFS-10828 and not for this bug.
          • Also, please make sure VolRemoveThread looks just like the one in HDFS-9781-branch-2.004.patch . Else we will have unnecessary code divergence between branch-2 and trunk.
          • And, once you use the right base patch + updated test, it should not time out. Locally its passing through for me.
          Show
          manojg Manoj Govindassamy added a comment - Arpit Agarwal , But the base changeset in v04 patch looks like the one for ReplicaMap lock improvements HDFS-10828 and not for this bug. Also, please make sure VolRemoveThread looks just like the one in HDFS-9781-branch-2.004.patch . Else we will have unnecessary code divergence between branch-2 and trunk. And, once you use the right base patch + updated test, it should not time out. Locally its passing through for me.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Thanks Manoj Govindassamy. Updated patch attached (although it still times out for me, pending HDFS-9781 I assume).

          Show
          arpitagarwal Arpit Agarwal added a comment - Thanks Manoj Govindassamy . Updated patch attached (although it still times out for me, pending HDFS-9781 I assume).
          Hide
          manojg Manoj Govindassamy added a comment -

          Arpit Agarwal, In the following block,

          • Try/catch is no more needed around line 650: dataset.removeVolumes(v. Especially line 654 and line 655 have to be removed.
          • By doing so, this VolRemoveThread would become just like the one in branch-2 patch in HDFS-9781. Refer: HDFS-9781-branch-2.004.patch.
          TestFsDatasetImpl#testRemoveVolumeBeingWritten
              class VolRemoveThread extends Thread {
                public void run() {
                  try {
                    Set<File> volumesToRemove = new HashSet<>();
                    volumesToRemove.add(StorageLocation.parse(
                        dataset.getVolume(eb).getBasePath()).getFile());
                    /**
                     * TODO: {@link FsDatasetImpl#removeVolumes(Set, boolean)} is throwing
                     * IllegalMonitorStateException when there is a parallel reader/writer
                     * to the volume. Remove below exception handling block after fixing
                     * HDFS-10830.
                     */
                    LOG.info("Removing volume " + volumesToRemove);
                    dataset.removeVolumes(volumesToRemove, true);
                    volRemoveCompletedLatch.countDown();
                    LOG.info("Removed volume " + volumesToRemove);
                  } catch (Exception e) {
                    LOG.info("Unexpected issue while removing volume: ", e);
                    volRemoveCompletedLatch.countDown();
                  }
                }
              }
          
          Show
          manojg Manoj Govindassamy added a comment - Arpit Agarwal , In the following block, Try/catch is no more needed around line 650: dataset.removeVolumes(v . Especially line 654 and line 655 have to be removed. By doing so, this VolRemoveThread would become just like the one in branch-2 patch in HDFS-9781 . Refer: HDFS-9781-branch-2.004.patch . TestFsDatasetImpl#testRemoveVolumeBeingWritten class VolRemoveThread extends Thread { public void run() { try { Set<File> volumesToRemove = new HashSet<>(); volumesToRemove.add(StorageLocation.parse( dataset.getVolume(eb).getBasePath()).getFile()); /** * TODO: {@link FsDatasetImpl#removeVolumes(Set, boolean )} is throwing * IllegalMonitorStateException when there is a parallel reader/writer * to the volume. Remove below exception handling block after fixing * HDFS-10830. */ LOG.info( "Removing volume " + volumesToRemove); dataset.removeVolumes(volumesToRemove, true ); volRemoveCompletedLatch.countDown(); LOG.info( "Removed volume " + volumesToRemove); } catch (Exception e) { LOG.info( "Unexpected issue while removing volume: " , e); volRemoveCompletedLatch.countDown(); } } }
          Hide
          arpitagarwal Arpit Agarwal added a comment - - edited

          Attaching the patch with the correct name.

          Hi Manoj Govindassamy, can you please point me to the exact location of the test workaround (specific line of code) so I know what to remove?

          Show
          arpitagarwal Arpit Agarwal added a comment - - edited Attaching the patch with the correct name. Hi Manoj Govindassamy , can you please point me to the exact location of the test workaround (specific line of code) so I know what to remove?
          Hide
          manojg Manoj Govindassamy added a comment -

          Arpit Agarwal,

          • I don't see v03 patch, probably you missed to attach it.
          • Patch v01 shows up as recently updated, but that doesn't have TestFsDatasetImpl#testRemoveVolumeBeingWritten changes to back out the workaround added for dataset.removeVolumes(). Can you please take care of this ?
          Show
          manojg Manoj Govindassamy added a comment - Arpit Agarwal , I don't see v03 patch, probably you missed to attach it. Patch v01 shows up as recently updated, but that doesn't have TestFsDatasetImpl#testRemoveVolumeBeingWritten changes to back out the workaround added for dataset.removeVolumes() . Can you please take care of this ?
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          v03 patch: rebased to trunk.

          Show
          arpitagarwal Arpit Agarwal added a comment - v03 patch: rebased to trunk.
          Hide
          manojg Manoj Govindassamy added a comment -

          Arpit Agarwal, attached v004 patch to HDFS-9781 with updated testcase. I got successful run with your fix patch. Can you also please verify your fix with the updated testcase ? Thanks for finding this issue and apologies for the inconvenience.

          Show
          manojg Manoj Govindassamy added a comment - Arpit Agarwal , attached v004 patch to HDFS-9781 with updated testcase. I got successful run with your fix patch. Can you also please verify your fix with the updated testcase ? Thanks for finding this issue and apologies for the inconvenience.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          The test timeout appears to have been introduced by HDFS-9781. I hit it even after backing out HDFS-10830, HDFS-10773 and HDFS-10682.

          Show
          arpitagarwal Arpit Agarwal added a comment - The test timeout appears to have been introduced by HDFS-9781 . I hit it even after backing out HDFS-10830 , HDFS-10773 and HDFS-10682 .
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 20s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          0 mvndep 1m 29s Maven dependency ordering for branch
          +1 mvninstall 7m 50s trunk passed
          +1 compile 7m 33s trunk passed
          +1 checkstyle 1m 27s trunk passed
          +1 mvnsite 2m 1s trunk passed
          +1 mvneclipse 0m 32s trunk passed
          +1 findbugs 3m 6s trunk passed
          +1 javadoc 1m 40s trunk passed
          0 mvndep 0m 14s Maven dependency ordering for patch
          +1 mvninstall 1m 23s the patch passed
          +1 compile 6m 51s the patch passed
          +1 javac 6m 51s the patch passed
          -0 checkstyle 1m 26s root: The patch generated 1 new + 114 unchanged - 0 fixed = 115 total (was 114)
          +1 mvnsite 1m 53s the patch passed
          +1 mvneclipse 0m 30s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 3m 28s the patch passed
          -1 javadoc 0m 49s hadoop-common-project_hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
          -1 unit 8m 29s hadoop-common in the patch failed.
          -1 unit 75m 2s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 24s The patch does not generate ASF License warnings.
          128m 28s



          Reason Tests
          Failed junit tests hadoop.net.TestDNS
            hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10830
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12826968/HDFS-10830.02.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 2bb80c25c605 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 07650bc
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16629/artifact/patchprocess/diff-checkstyle-root.txt
          javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/16629/artifact/patchprocess/diff-javadoc-javadoc-hadoop-common-project_hadoop-common.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16629/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16629/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16629/testReport/
          modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16629/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. 0 mvndep 1m 29s Maven dependency ordering for branch +1 mvninstall 7m 50s trunk passed +1 compile 7m 33s trunk passed +1 checkstyle 1m 27s trunk passed +1 mvnsite 2m 1s trunk passed +1 mvneclipse 0m 32s trunk passed +1 findbugs 3m 6s trunk passed +1 javadoc 1m 40s trunk passed 0 mvndep 0m 14s Maven dependency ordering for patch +1 mvninstall 1m 23s the patch passed +1 compile 6m 51s the patch passed +1 javac 6m 51s the patch passed -0 checkstyle 1m 26s root: The patch generated 1 new + 114 unchanged - 0 fixed = 115 total (was 114) +1 mvnsite 1m 53s the patch passed +1 mvneclipse 0m 30s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 3m 28s the patch passed -1 javadoc 0m 49s hadoop-common-project_hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) -1 unit 8m 29s hadoop-common in the patch failed. -1 unit 75m 2s hadoop-hdfs in the patch failed. +1 asflicense 0m 24s The patch does not generate ASF License warnings. 128m 28s Reason Tests Failed junit tests hadoop.net.TestDNS   hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10830 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12826968/HDFS-10830.02.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 2bb80c25c605 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 07650bc Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16629/artifact/patchprocess/diff-checkstyle-root.txt javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/16629/artifact/patchprocess/diff-javadoc-javadoc-hadoop-common-project_hadoop-common.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/16629/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/16629/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16629/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16629/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          v2 patch fixes the wait/await bug.

          Show
          arpitagarwal Arpit Agarwal added a comment - v2 patch fixes the wait/await bug.
          Hide
          manojg Manoj Govindassamy added a comment -

          Sounds good to me. Thanks a lot Xiao Chen.

          Show
          manojg Manoj Govindassamy added a comment - Sounds good to me. Thanks a lot Xiao Chen .
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks for working on this Arpit Agarwal and Manoj Govindassamy.
          Just letting you know that I'm about to commit HDFS-9781, which has a workaround in unit test of this issue. Please update to trunk and remove that workaround in this patch. (I think it's good to decouple these 2 jiras, so the unit test workaround is fine).

          Show
          xiaochen Xiao Chen added a comment - Thanks for working on this Arpit Agarwal and Manoj Govindassamy . Just letting you know that I'm about to commit HDFS-9781 , which has a workaround in unit test of this issue. Please update to trunk and remove that workaround in this patch. (I think it's good to decouple these 2 jiras, so the unit test workaround is fine).
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Manoj Govindassamy, you're right. findbugs caught it too. I'll post an updated patch later today. Thank you for the review!

          Show
          arpitagarwal Arpit Agarwal added a comment - Manoj Govindassamy , you're right. findbugs caught it too. I'll post an updated patch later today. Thank you for the review!
          Hide
          manojg Manoj Govindassamy added a comment - - edited

          Arpit Agarwal,

            void waitVolumeRemoved(int sleepMillis, Condition condition) {
                .. .. ..
                try {
                  condition.wait(sleepMillis);  <==
                } catch (InterruptedException e) {
                  FsDatasetImpl.LOG.info("Thread interrupted when waiting for "
                      + "volume reference to be released.");
                  Thread.currentThread().interrupt();
                }
          
          1. By calling Object monitor method wait() on Condition variable, it will be treated like a normal Object and will try to use its intrinsic monitor as against the Condition's wait-set instance (here datasetLockCondition). Copying below the note I read from java doc on Condition.

          Ref

          Note that Condition instances are just normal objects and can themselves be used as the target in a synchronized statement, and can have their own monitor wait and notification methods invoked. Acquiring the monitor lock of a Condition instance, or using its monitor methods, has no specified relationship with acquiring the Lock associated with that Condition or the use of its waiting and signalling methods. It is recommended that to avoid confusion you never use Condition instances in this way, except perhaps within their own implementation.

          1. So, the usage of condition.wait() here expects the Object monitor to be locked using synchronized() and since it is not locked, we will again get IllegalMonitorStateException. Please let me know if my understanding is wrong.
          Show
          manojg Manoj Govindassamy added a comment - - edited Arpit Agarwal , void waitVolumeRemoved( int sleepMillis, Condition condition) { .. .. .. try { condition.wait(sleepMillis); <== } catch (InterruptedException e) { FsDatasetImpl.LOG.info( " Thread interrupted when waiting for " + "volume reference to be released." ); Thread .currentThread().interrupt(); } By calling Object monitor method wait() on Condition variable, it will be treated like a normal Object and will try to use its intrinsic monitor as against the Condition's wait-set instance (here datasetLockCondition ). Copying below the note I read from java doc on Condition. Ref Note that Condition instances are just normal objects and can themselves be used as the target in a synchronized statement, and can have their own monitor wait and notification methods invoked. Acquiring the monitor lock of a Condition instance, or using its monitor methods, has no specified relationship with acquiring the Lock associated with that Condition or the use of its waiting and signalling methods. It is recommended that to avoid confusion you never use Condition instances in this way, except perhaps within their own implementation. So, the usage of condition.wait() here expects the Object monitor to be locked using synchronized() and since it is not locked, we will again get IllegalMonitorStateException. Please let me know if my understanding is wrong.
          Hide
          manojg Manoj Govindassamy added a comment -

          sure, lets track the wait/signal improvements with a new jira. Give me some time, will take one more look at this patch and get back to you.

          Show
          manojg Manoj Govindassamy added a comment - sure, lets track the wait/signal improvements with a new jira. Give me some time, will take one more look at this patch and get back to you.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Wouldn't it be better to go for the following ..wait and signal model compared to polling

          I completely agree, but that may be a more complex change. Let's fix the immediate problem first and address the signaling improvement later. Sound fair?

          I assigned it to myself.

          Show
          arpitagarwal Arpit Agarwal added a comment - Wouldn't it be better to go for the following ..wait and signal model compared to polling I completely agree, but that may be a more complex change. Let's fix the immediate problem first and address the signaling improvement later. Sound fair? I assigned it to myself.
          Hide
          manojg Manoj Govindassamy added a comment -

          Thanks for working on this Arpit Agarwal and submitting a patch. Would you be willing to own this bug ? If so, please feel free to assign this bug to yourself.

          Had a quick look at the patch. I believe the submitted patch will not throw IllegalMonitorStateException. But, looks like the patch also retains the old "polling" model. That is, wait and check in a loop. Wouldn't it be better to go for the following ..

          • wait and signal model compared to polling
          • and with a overall timeout to give up or abort the remove volume operation as the other concurrent user is taking a long time to finish his operations and the reference count not going to zero.

          Please let me know your thoughts on above.

          Show
          manojg Manoj Govindassamy added a comment - Thanks for working on this Arpit Agarwal and submitting a patch. Would you be willing to own this bug ? If so, please feel free to assign this bug to yourself. Had a quick look at the patch. I believe the submitted patch will not throw IllegalMonitorStateException. But, looks like the patch also retains the old "polling" model. That is, wait and check in a loop. Wouldn't it be better to go for the following .. wait and signal model compared to polling and with a overall timeout to give up or abort the remove volume operation as the other concurrent user is taking a long time to finish his operations and the reference count not going to zero. Please let me know your thoughts on above.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          0 mvndep 0m 13s Maven dependency ordering for branch
          +1 mvninstall 6m 39s trunk passed
          +1 compile 6m 44s trunk passed
          +1 checkstyle 1m 24s trunk passed
          +1 mvnsite 1m 45s trunk passed
          +1 mvneclipse 0m 25s trunk passed
          +1 findbugs 2m 58s trunk passed
          +1 javadoc 1m 39s trunk passed
          0 mvndep 0m 13s Maven dependency ordering for patch
          +1 mvninstall 1m 21s the patch passed
          +1 compile 6m 45s the patch passed
          +1 javac 6m 45s the patch passed
          -0 checkstyle 1m 27s root: The patch generated 1 new + 114 unchanged - 0 fixed = 115 total (was 114)
          +1 mvnsite 1m 46s the patch passed
          +1 mvneclipse 0m 24s the patch passed
          -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          -1 findbugs 2m 0s hadoop-hdfs-project/hadoop-hdfs generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
          -1 javadoc 0m 47s hadoop-common-project_hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
          +1 unit 7m 47s hadoop-common in the patch passed.
          -1 unit 60m 8s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 22s The patch does not generate ASF License warnings.
          108m 26s



          Reason Tests
          FindBugs module:hadoop-hdfs-project/hadoop-hdfs
            Monitor wait() called on a Condition in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(int, Condition) At FsVolumeList.java:Condition in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(int, Condition) At FsVolumeList.java:[line 285]
            Calling wait rather than await in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(int, Condition) At FsVolumeList.java:in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(int, Condition) At FsVolumeList.java:[line 285]
          Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10830
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12826829/HDFS-10830.01.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 6359cf78b7b7 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 05f5c0f
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16617/artifact/patchprocess/diff-checkstyle-root.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/16617/artifact/patchprocess/whitespace-eol.txt
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/16617/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html
          javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/16617/artifact/patchprocess/diff-javadoc-javadoc-hadoop-common-project_hadoop-common.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16617/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16617/testReport/
          modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16617/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. 0 mvndep 0m 13s Maven dependency ordering for branch +1 mvninstall 6m 39s trunk passed +1 compile 6m 44s trunk passed +1 checkstyle 1m 24s trunk passed +1 mvnsite 1m 45s trunk passed +1 mvneclipse 0m 25s trunk passed +1 findbugs 2m 58s trunk passed +1 javadoc 1m 39s trunk passed 0 mvndep 0m 13s Maven dependency ordering for patch +1 mvninstall 1m 21s the patch passed +1 compile 6m 45s the patch passed +1 javac 6m 45s the patch passed -0 checkstyle 1m 27s root: The patch generated 1 new + 114 unchanged - 0 fixed = 115 total (was 114) +1 mvnsite 1m 46s the patch passed +1 mvneclipse 0m 24s the patch passed -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply -1 findbugs 2m 0s hadoop-hdfs-project/hadoop-hdfs generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) -1 javadoc 0m 47s hadoop-common-project_hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) +1 unit 7m 47s hadoop-common in the patch passed. -1 unit 60m 8s hadoop-hdfs in the patch failed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 108m 26s Reason Tests FindBugs module:hadoop-hdfs-project/hadoop-hdfs   Monitor wait() called on a Condition in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(int, Condition) At FsVolumeList.java:Condition in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(int, Condition) At FsVolumeList.java: [line 285]   Calling wait rather than await in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(int, Condition) At FsVolumeList.java:in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(int, Condition) At FsVolumeList.java: [line 285] Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10830 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12826829/HDFS-10830.01.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 6359cf78b7b7 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 05f5c0f Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16617/artifact/patchprocess/diff-checkstyle-root.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/16617/artifact/patchprocess/whitespace-eol.txt findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/16617/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/16617/artifact/patchprocess/diff-javadoc-javadoc-hadoop-common-project_hadoop-common.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/16617/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16617/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16617/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          arpitagarwal Arpit Agarwal added a comment - - edited

          Thanks for the clarification Xiao Chen.

          Manoj Govindassamy, we can replace the current wait with a wait on a condition object corresponding to the ReentrantLock. If you haven't started I can post a simple patch to fix this. It would also require a change to replace FsVolumeList#checkDirsMutex with a separate reentrant lock.

          Show
          arpitagarwal Arpit Agarwal added a comment - - edited Thanks for the clarification Xiao Chen . Manoj Govindassamy , we can replace the current wait with a wait on a condition object corresponding to the ReentrantLock. If you haven't started I can post a simple patch to fix this. It would also require a change to replace FsVolumeList#checkDirsMutex with a separate reentrant lock.
          Hide
          manojg Manoj Govindassamy added a comment -

          Xiao Chen,

          At that time removeVolumes was in a synchronized block.

          Ohh, I am getting it now. That change totally makes sense then. Thanks for the clarification. And, my bad for not digging the old changesets and history fully. Now, as removesVolumes() is not a synchronized method and with more locking changes around, we will have to reason out a wait/signal model for this removeVolumes() operations.

          Show
          manojg Manoj Govindassamy added a comment - Xiao Chen , At that time removeVolumes was in a synchronized block. Ohh, I am getting it now. That change totally makes sense then. Thanks for the clarification. And, my bad for not digging the old changesets and history fully. Now, as removesVolumes() is not a synchronized method and with more locking changes around, we will have to reason out a wait/signal model for this removeVolumes() operations.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Manoj and Arpit. IIRC the reason of not Thread.sleep here was that there was a findbug warning complaining sleeping while holding a lock, while I tried to fix a locking issue in HDFS-9701. At that time removeVolumes was in a synchronized block. No objections on fixing this now that the locking on FSDatasetImpl is changed.

          Show
          xiaochen Xiao Chen added a comment - Thanks Manoj and Arpit. IIRC the reason of not Thread.sleep here was that there was a findbug warning complaining sleeping while holding a lock, while I tried to fix a locking issue in HDFS-9701 . At that time removeVolumes was in a synchronized block. No objections on fixing this now that the locking on FSDatasetImpl is changed.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          We can probably fix this without a temporary workaround as it is a simple fix. The current wait is just equivalent to releasing the lock, sleeping for sleepMillis and then reacquiring it. So I think we can just replace it accordingly (unless Lei or Xiao get back with something we missed). We can fix waitVolumeRemoved to not accept the monitor.

          Thanks for adding a test case that reliably repros it. Did this bug make it into to the 3.0.0-alpha1 RC?

          Show
          arpitagarwal Arpit Agarwal added a comment - We can probably fix this without a temporary workaround as it is a simple fix. The current wait is just equivalent to releasing the lock, sleeping for sleepMillis and then reacquiring it. So I think we can just replace it accordingly (unless Lei or Xiao get back with something we missed). We can fix waitVolumeRemoved to not accept the monitor. Thanks for adding a test case that reliably repros it. Did this bug make it into to the 3.0.0-alpha1 RC?
          Hide
          manojg Manoj Govindassamy added a comment -

          As part of HDFS-9781, I have updated the unit test TestFsDatasetImpl#testRemoveVolumeBeingWritten to recreate this IllegalMonitorStateException problem consistently. For now to unblock HDFS-9781 tests, I have placed a workaround to ignore any exceptions coming out of FsDatasetImpl#removeVolumes(). The fix patch for this jira issue HDFS-10830 should remove the workaround placed earlier and verify the fix working.

          Show
          manojg Manoj Govindassamy added a comment - As part of HDFS-9781 , I have updated the unit test TestFsDatasetImpl#testRemoveVolumeBeingWritten to recreate this IllegalMonitorStateException problem consistently. For now to unblock HDFS-9781 tests, I have placed a workaround to ignore any exceptions coming out of FsDatasetImpl#removeVolumes() . The fix patch for this jira issue HDFS-10830 should remove the workaround placed earlier and verify the fix working.
          Hide
          manojg Manoj Govindassamy added a comment -

          Thanks for taking a look at this Arpit Agarwal.

          Yes, doesn't look like the current user of the volume is notifying the waiters. First, waitVolumeRemoved() callers submit their own monitor object and I am not user if other volumes users will have access to this monitor to signal them. More over, this while block can loop for a longer duration than 5seconds and the removeVolumes() with FsDatasetLock#datasetLock acquired can starve others for a long time.

          Show
          manojg Manoj Govindassamy added a comment - Thanks for taking a look at this Arpit Agarwal . Yes, doesn't look like the current user of the volume is notifying the waiters. First, waitVolumeRemoved() callers submit their own monitor object and I am not user if other volumes users will have access to this monitor to signal them. More over, this while block can loop for a longer duration than 5seconds and the removeVolumes() with FsDatasetLock#datasetLock acquired can starve others for a long time.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Hi Lei (Eddy) Xu, Xiao Chen, I am looking into fixing this issue (it's a recent regression).

          However I can't find a notify call that would wake up monitor.wait(sleepMillis) in FsVolumeList#waitVolumeRemoved. Could you please let me know if there is such a call that I am missing?

          Show
          arpitagarwal Arpit Agarwal added a comment - Hi Lei (Eddy) Xu , Xiao Chen , I am looking into fixing this issue (it's a recent regression). However I can't find a notify call that would wake up monitor.wait(sleepMillis) in FsVolumeList#waitVolumeRemoved . Could you please let me know if there is such a call that I am missing?
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Thanks for reporting this Manoj Govindassamy. I am looking for a notify/notifyAll corresponding to this wait call.

          So far I don't see it, so this wait() call may just be a sleep.

          Show
          arpitagarwal Arpit Agarwal added a comment - Thanks for reporting this Manoj Govindassamy . I am looking for a notify/notifyAll corresponding to this wait call. So far I don't see it, so this wait() call may just be a sleep.

            People

            • Assignee:
              arpitagarwal Arpit Agarwal
              Reporter:
              manojg Manoj Govindassamy
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development