Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Right now the Namenode will log when a write lock is held for a long time to help tracks methods which are causing expensive delays. Let's do the same for read locks since these operations may also be expensive/long and cause delays.

      1. HDFS-10817.000.patch
        10 kB
        Erik Krogen
      2. HDFS-10817.001.patch
        10 kB
        Erik Krogen
      3. HDFS-10817.002.patch
        12 kB
        Erik Krogen
      4. HDFS-10817.003.patch
        12 kB
        Erik Krogen
      5. HDFS-10817.004.patch
        12 kB
        Erik Krogen
      6. HDFS-10817.addendum.000.patch
        4 kB
        Erik Krogen
      7. HDFS-10817.addendum.001.patch
        3 kB
        Zhe Zhang

        Issue Links

          Activity

          Hide
          xkrogen Erik Krogen added a comment -

          Added a patch to implement this with a configurable threshold. Required the use of ThreadLocal since multiple threads acquire read locks simultaneously.

          Show
          xkrogen Erik Krogen added a comment - Added a patch to implement this with a configurable threshold. Required the use of ThreadLocal since multiple threads acquire read locks simultaneously.
          Hide
          xkrogen Erik Krogen added a comment -

          Uploaded new patch that doesn't use semaphores; forgot about Java's CountDownLatch which is cleaner (used in testing).

          Show
          xkrogen Erik Krogen added a comment - Uploaded new patch that doesn't use semaphores; forgot about Java's CountDownLatch which is cleaner (used in testing).
          Hide
          zhz Zhe Zhang added a comment -

          Thanks Erik Krogen for the work! Patch looks pretty good. A few minor issues:

          1. Some lines are longer than 80 chars. Also I think multi-line comments should be:
            /**
             * blah blah
             */
            
          2. Can we unify the logic for read and write locking? I.e. lastUnlock and needReport
          3. readLockHeldTimeStamp.remove(); doesn't look necessary?
          4. It's a little flaky to only assert the log message contains the method name. Can we do more precise assertion here?
          5. Retrospectively, we should probably consolidate both write and read lock time-keeping in the FSNamesystemLock class itself. Doesn't make sense for both writeLock and writeLockInterruptibly to do the check and count time. We should file a follow-on JIRA for that.
          6. Another nice follow-on is to start a number of threads, some with sleep (longer than the threshold) and some don't. Then just verify the output has the ones with sleep.
          Show
          zhz Zhe Zhang added a comment - Thanks Erik Krogen for the work! Patch looks pretty good. A few minor issues: Some lines are longer than 80 chars. Also I think multi-line comments should be: /** * blah blah */ Can we unify the logic for read and write locking? I.e. lastUnlock and needReport readLockHeldTimeStamp.remove(); doesn't look necessary? It's a little flaky to only assert the log message contains the method name. Can we do more precise assertion here? Retrospectively, we should probably consolidate both write and read lock time-keeping in the FSNamesystemLock class itself. Doesn't make sense for both writeLock and writeLockInterruptibly to do the check and count time. We should file a follow-on JIRA for that. Another nice follow-on is to start a number of threads, some with sleep (longer than the threshold) and some don't. Then just verify the output has the ones with sleep.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 12s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 8m 17s trunk passed
          +1 compile 0m 48s trunk passed
          +1 checkstyle 0m 31s trunk passed
          +1 mvnsite 0m 55s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 47s trunk passed
          +1 javadoc 0m 56s trunk passed
          +1 mvninstall 0m 47s the patch passed
          +1 compile 0m 45s the patch passed
          +1 javac 0m 45s the patch passed
          -0 checkstyle 0m 30s hadoop-hdfs-project/hadoop-hdfs: The patch generated 12 new + 586 unchanged - 1 fixed = 598 total (was 587)
          +1 mvnsite 0m 52s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          +1 findbugs 1m 49s the patch passed
          +1 javadoc 0m 53s the patch passed
          +1 unit 57m 57s hadoop-hdfs in the patch passed.
          +1 asflicense 0m 20s The patch does not generate ASF License warnings.
          78m 55s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10817
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12826271/HDFS-10817.001.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
          uname Linux 7156bedcdb0c 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / d6d9cff
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16581/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16581/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16581/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 8m 17s trunk passed +1 compile 0m 48s trunk passed +1 checkstyle 0m 31s trunk passed +1 mvnsite 0m 55s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 47s trunk passed +1 javadoc 0m 56s trunk passed +1 mvninstall 0m 47s the patch passed +1 compile 0m 45s the patch passed +1 javac 0m 45s the patch passed -0 checkstyle 0m 30s hadoop-hdfs-project/hadoop-hdfs: The patch generated 12 new + 586 unchanged - 1 fixed = 598 total (was 587) +1 mvnsite 0m 52s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 1m 49s the patch passed +1 javadoc 0m 53s the patch passed +1 unit 57m 57s hadoop-hdfs in the patch passed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 78m 55s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10817 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12826271/HDFS-10817.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux 7156bedcdb0c 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / d6d9cff Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16581/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16581/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16581/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xkrogen Erik Krogen added a comment -

          Thanks for the comments Zhe Zhang! Uploading a new patch addressing some comments as described below.

          1. Thanks for pointing that out. Updated.
          2. Done. I originally had it that way because I didn't want to grab the ThreadLocal while inside the lock but after further considering the performance characteristics of this I do not think it is an issue.
          3. It is my understanding that it is good practice to clean up ThreadLocal values when you are finished with them, especially if they are set within a thread pool (which, IIUC, they are here). See http://stackoverflow.com/a/818120/5594176. However I do not have much experience with ThreadLocals and will defer to your call.
          4. I have made those a little more robust.
          5. This actually isn't possible under the current FSNamesystemLock design. Within FSNamesystemLock, the methods to actually lock are not exposed. It simply has methods writeLock and readLock which return the respective lock, then the calling class uses the lock/lockIinterruptibly methods on the returned objects.
          6. This is a good idea. I have added more variety to the threads which are tested in testFSReadLockLongHoldingReport.

          Show
          xkrogen Erik Krogen added a comment - Thanks for the comments Zhe Zhang ! Uploading a new patch addressing some comments as described below. 1. Thanks for pointing that out. Updated. 2. Done. I originally had it that way because I didn't want to grab the ThreadLocal while inside the lock but after further considering the performance characteristics of this I do not think it is an issue. 3. It is my understanding that it is good practice to clean up ThreadLocal values when you are finished with them, especially if they are set within a thread pool (which, IIUC, they are here). See http://stackoverflow.com/a/818120/5594176 . However I do not have much experience with ThreadLocals and will defer to your call. 4. I have made those a little more robust. 5. This actually isn't possible under the current FSNamesystemLock design. Within FSNamesystemLock , the methods to actually lock are not exposed. It simply has methods writeLock and readLock which return the respective lock, then the calling class uses the lock / lockIinterruptibly methods on the returned objects. 6. This is a good idea. I have added more variety to the threads which are tested in testFSReadLockLongHoldingReport .
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 8s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 6s trunk passed
          +1 compile 0m 44s trunk passed
          +1 checkstyle 0m 30s trunk passed
          +1 mvnsite 0m 56s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 43s trunk passed
          +1 javadoc 0m 58s trunk passed
          +1 mvninstall 0m 51s the patch passed
          +1 compile 0m 43s the patch passed
          +1 javac 0m 43s the patch passed
          -0 checkstyle 0m 29s hadoop-hdfs-project/hadoop-hdfs: The patch generated 6 new + 586 unchanged - 1 fixed = 592 total (was 587)
          +1 mvnsite 0m 49s the patch passed
          +1 mvneclipse 0m 9s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          +1 findbugs 1m 51s the patch passed
          +1 javadoc 0m 53s the patch passed
          -1 unit 59m 57s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          79m 33s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10817
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12826296/HDFS-10817.002.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
          uname Linux 00070f740520 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 20ae1fa
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16585/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16585/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16585/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16585/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 8s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 6s trunk passed +1 compile 0m 44s trunk passed +1 checkstyle 0m 30s trunk passed +1 mvnsite 0m 56s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 43s trunk passed +1 javadoc 0m 58s trunk passed +1 mvninstall 0m 51s the patch passed +1 compile 0m 43s the patch passed +1 javac 0m 43s the patch passed -0 checkstyle 0m 29s hadoop-hdfs-project/hadoop-hdfs: The patch generated 6 new + 586 unchanged - 1 fixed = 592 total (was 587) +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 9s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 1m 51s the patch passed +1 javadoc 0m 53s the patch passed -1 unit 59m 57s hadoop-hdfs in the patch failed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 79m 33s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10817 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12826296/HDFS-10817.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux 00070f740520 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 20ae1fa Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16585/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/16585/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16585/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16585/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xkrogen Erik Krogen added a comment -

          Test failure does not seem related.

          Updated patch removing some unused imports to clean up style checker.

          Show
          xkrogen Erik Krogen added a comment - Test failure does not seem related. Updated patch removing some unused imports to clean up style checker.
          Hide
          zhz Zhe Zhang added a comment -

          Thanks Erik for the updates.

          I'm not very familiar with ThreadLocal either. Thanks for the pointer and I agree it's a good idea to remove(). This way when the thread is recycled by the thread pool, the value will be reset.

          One corner case is when readUnlock is called before any readLock. Will we see an NPE? Since we are not overriding ThreadLocal#initialValue.

          5. This actually isn't possible under the current FSNamesystemLock design. Within FSNamesystemLock, the methods to actually lock are not exposed. It simply has methods writeLock and readLock which return the respective lock, then the calling class uses the lock/lockIinterruptibly methods on the returned objects.

          Good thoughts. I was initially thinking about moving the time-tracking logic into FSNamesystemLock. E.g. we can move writeLockHeldTimeStamp and readLockHeldTimeStamp into FSNamesystemLock. The lock class can take care of the time-tracking logic. This way lock and lockIinterruptibly can share the logic and the time-tracking logic can be unit tested more easily. We can also consider consolidating the threshold and needReport logic there too. Maybe also add richer APIs to the FSNamesystemLock class like reporting the overall lock held time of a thread (instead of a single reentrant period). But I think we should discuss all these as a separate follow-on anyway. Also, since we are printing the stack trace, the methods which actually acquire the lock are still logged right?

          The test code looks very good.

          So +1 after we clarify the readUnlock NPE concern.

          Show
          zhz Zhe Zhang added a comment - Thanks Erik for the updates. I'm not very familiar with ThreadLocal either. Thanks for the pointer and I agree it's a good idea to remove() . This way when the thread is recycled by the thread pool, the value will be reset. One corner case is when readUnlock is called before any readLock . Will we see an NPE? Since we are not overriding ThreadLocal#initialValue . 5. This actually isn't possible under the current FSNamesystemLock design. Within FSNamesystemLock, the methods to actually lock are not exposed. It simply has methods writeLock and readLock which return the respective lock, then the calling class uses the lock/lockIinterruptibly methods on the returned objects. Good thoughts. I was initially thinking about moving the time-tracking logic into FSNamesystemLock . E.g. we can move writeLockHeldTimeStamp and readLockHeldTimeStamp into FSNamesystemLock . The lock class can take care of the time-tracking logic. This way lock and lockIinterruptibly can share the logic and the time-tracking logic can be unit tested more easily. We can also consider consolidating the threshold and needReport logic there too. Maybe also add richer APIs to the FSNamesystemLock class like reporting the overall lock held time of a thread (instead of a single reentrant period). But I think we should discuss all these as a separate follow-on anyway. Also, since we are printing the stack trace, the methods which actually acquire the lock are still logged right? The test code looks very good. So +1 after we clarify the readUnlock NPE concern.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 23s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 8m 28s trunk passed
          +1 compile 0m 56s trunk passed
          +1 checkstyle 0m 36s trunk passed
          +1 mvnsite 1m 1s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 52s trunk passed
          +1 javadoc 0m 57s trunk passed
          +1 mvninstall 0m 59s the patch passed
          +1 compile 0m 51s the patch passed
          +1 javac 0m 51s the patch passed
          -0 checkstyle 0m 32s hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 586 unchanged - 1 fixed = 589 total (was 587)
          +1 mvnsite 0m 57s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          +1 findbugs 1m 59s the patch passed
          +1 javadoc 0m 55s the patch passed
          -1 unit 88m 30s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          111m 0s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10817
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12826458/HDFS-10817.003.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
          uname Linux 85b17cdd06f4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 20ae1fa
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16594/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16594/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16594/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16594/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 23s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 8m 28s trunk passed +1 compile 0m 56s trunk passed +1 checkstyle 0m 36s trunk passed +1 mvnsite 1m 1s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 52s trunk passed +1 javadoc 0m 57s trunk passed +1 mvninstall 0m 59s the patch passed +1 compile 0m 51s the patch passed +1 javac 0m 51s the patch passed -0 checkstyle 0m 32s hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 586 unchanged - 1 fixed = 589 total (was 587) +1 mvnsite 0m 57s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 1m 59s the patch passed +1 javadoc 0m 55s the patch passed -1 unit 88m 30s hadoop-hdfs in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 111m 0s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10817 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12826458/HDFS-10817.003.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux 85b17cdd06f4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 20ae1fa Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16594/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/16594/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16594/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16594/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xkrogen Erik Krogen added a comment -

          It is true that you would see an NPE. I initially ignored this since an Exception will be thrown anyway if readUnlock is called before readLock but it's probably best to keep the proper exception (IllegalMonitorStateException) being thrown. Submitted an updated patch.

          Show
          xkrogen Erik Krogen added a comment - It is true that you would see an NPE. I initially ignored this since an Exception will be thrown anyway if readUnlock is called before readLock but it's probably best to keep the proper exception ( IllegalMonitorStateException ) being thrown. Submitted an updated patch.
          Hide
          zhz Zhe Zhang added a comment -

          Thanks Erik. +1 on v4 patch. I'll commit after Jenkins.

          Show
          zhz Zhe Zhang added a comment - Thanks Erik. +1 on v4 patch. I'll commit after Jenkins.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 1s trunk passed
          +1 compile 0m 44s trunk passed
          +1 checkstyle 0m 32s trunk passed
          +1 mvnsite 0m 51s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 1m 41s trunk passed
          +1 javadoc 0m 54s trunk passed
          +1 mvninstall 0m 45s the patch passed
          +1 compile 0m 41s the patch passed
          +1 javac 0m 41s the patch passed
          -0 checkstyle 0m 30s hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 586 unchanged - 1 fixed = 589 total (was 587)
          +1 mvnsite 0m 48s the patch passed
          +1 mvneclipse 0m 9s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          +1 findbugs 1m 45s the patch passed
          +1 javadoc 0m 50s the patch passed
          +1 unit 59m 16s hadoop-hdfs in the patch passed.
          +1 asflicense 0m 19s The patch does not generate ASF License warnings.
          78m 32s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10817
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12826498/HDFS-10817.004.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
          uname Linux feaff1cc73ed 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 01721dd
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16599/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16599/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16599/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 1s trunk passed +1 compile 0m 44s trunk passed +1 checkstyle 0m 32s trunk passed +1 mvnsite 0m 51s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 41s trunk passed +1 javadoc 0m 54s trunk passed +1 mvninstall 0m 45s the patch passed +1 compile 0m 41s the patch passed +1 javac 0m 41s the patch passed -0 checkstyle 0m 30s hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 586 unchanged - 1 fixed = 589 total (was 587) +1 mvnsite 0m 48s the patch passed +1 mvneclipse 0m 9s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 1m 45s the patch passed +1 javadoc 0m 50s the patch passed +1 unit 59m 16s hadoop-hdfs in the patch passed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 78m 32s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10817 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12826498/HDFS-10817.004.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux feaff1cc73ed 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 01721dd Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16599/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16599/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16599/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          zhz Zhe Zhang added a comment -

          BTW backporting to lower branches hit this issue: http://stackoverflow.com/questions/28408109/java-stopped-erroring-on-non-final-variables-in-inner-classes-java-8

          I changed some variables to final in lower branches.

          Show
          zhz Zhe Zhang added a comment - BTW backporting to lower branches hit this issue: http://stackoverflow.com/questions/28408109/java-stopped-erroring-on-non-final-variables-in-inner-classes-java-8 I changed some variables to final in lower branches.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10381 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10381/)
          HDFS-10817. Add Logging for Long-held NN Read Locks. Contributed by Erik (zhz: rev 6f4b0d33ca339e3724623a1d23c101f8cfd3cdd5)

          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10381 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10381/ ) HDFS-10817 . Add Logging for Long-held NN Read Locks. Contributed by Erik (zhz: rev 6f4b0d33ca339e3724623a1d23c101f8cfd3cdd5) (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          Hide
          xkrogen Erik Krogen added a comment -

          Great, thanks Zhe!

          Show
          xkrogen Erik Krogen added a comment - Great, thanks Zhe!
          Hide
          xkrogen Erik Krogen added a comment - - edited

          Discovered that the test fails on branch-2 because the logs do not include the thread name, which the test was using to determine which log statements came from which threads. The test has been modified to use a different technique to determine this. See addendum patch.

          Show
          xkrogen Erik Krogen added a comment - - edited Discovered that the test fails on branch-2 because the logs do not include the thread name, which the test was using to determine which log statements came from which threads. The test has been modified to use a different technique to determine this. See addendum patch.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 31s trunk passed
          +1 compile 0m 46s trunk passed
          +1 checkstyle 0m 26s trunk passed
          +1 mvnsite 0m 54s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 48s trunk passed
          +1 javadoc 0m 56s trunk passed
          +1 mvninstall 0m 50s the patch passed
          +1 compile 0m 47s the patch passed
          +1 javac 0m 47s the patch passed
          +1 checkstyle 0m 23s the patch passed
          +1 mvnsite 0m 53s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 57s the patch passed
          +1 javadoc 0m 55s the patch passed
          -1 unit 63m 53s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 19s The patch does not generate ASF License warnings.
          84m 11s



          Reason Tests
          Failed junit tests hadoop.hdfs.TestCrcCorruption
            hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock
            hadoop.hdfs.TestErasureCodeBenchmarkThroughput



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10817
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12826535/HDFS-10817.addendum.000.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 79cf3b33c099 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 6f4b0d3
          Default Java 1.8.0_101
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16602/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16602/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16602/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 31s trunk passed +1 compile 0m 46s trunk passed +1 checkstyle 0m 26s trunk passed +1 mvnsite 0m 54s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 48s trunk passed +1 javadoc 0m 56s trunk passed +1 mvninstall 0m 50s the patch passed +1 compile 0m 47s the patch passed +1 javac 0m 47s the patch passed +1 checkstyle 0m 23s the patch passed +1 mvnsite 0m 53s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 57s the patch passed +1 javadoc 0m 55s the patch passed -1 unit 63m 53s hadoop-hdfs in the patch failed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 84m 11s Reason Tests Failed junit tests hadoop.hdfs.TestCrcCorruption   hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock   hadoop.hdfs.TestErasureCodeBenchmarkThroughput Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10817 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12826535/HDFS-10817.addendum.000.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 79cf3b33c099 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 6f4b0d3 Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/16602/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16602/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16602/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          zhz Zhe Zhang added a comment -

          Thanks Erik. The addendum fix does solve the problem. Somehow the LogCapturer prints thread IDs in trunk but not in branch-2. But after the fix we are no longer relying on the behavior of LogCapturer.

          But the "Spin up a bunch of threads" part no longer makes much sense now. Because we are checking by method name, and 25 threads share the same method name. I removed that part in the test (based on the original trunk test we have already verified the behavior). We should think of a proper way to test with multiple parallel threads.

          Show
          zhz Zhe Zhang added a comment - Thanks Erik. The addendum fix does solve the problem. Somehow the LogCapturer prints thread IDs in trunk but not in branch-2. But after the fix we are no longer relying on the behavior of LogCapturer . But the "Spin up a bunch of threads" part no longer makes much sense now. Because we are checking by method name, and 25 threads share the same method name. I removed that part in the test (based on the original trunk test we have already verified the behavior). We should think of a proper way to test with multiple parallel threads.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10382 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10382/)
          Addendum fix for HDFS-10817 to fix failure of the added (zhz: rev 6c600360ca469d5fe0f017d681585db06c80c9cc)

          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10382 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10382/ ) Addendum fix for HDFS-10817 to fix failure of the added (zhz: rev 6c600360ca469d5fe0f017d681585db06c80c9cc) (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java
          Hide
          zhz Zhe Zhang added a comment -

          I committed the patch from trunk to branch-2.7. Note that from trunk to branch-2.8 there is an addendum fix, because the test failure was found when I tried backporting from branch-2.8 to branch-2.7.

          Thanks Erik Krogen for the contribution. Great work!

          Show
          zhz Zhe Zhang added a comment - I committed the patch from trunk to branch-2.7. Note that from trunk to branch-2.8 there is an addendum fix, because the test failure was found when I tried backporting from branch-2.8 to branch-2.7. Thanks Erik Krogen for the contribution. Great work!
          Hide
          xkrogen Erik Krogen added a comment -

          Thanks Zhe Zhang! But I disagree that spinning up multiple threads no longer makes sense. I check that 25 log outputs were printed and each thread will only print once, so 25 outputs implies that all 25 threads produced log output.

          Show
          xkrogen Erik Krogen added a comment - Thanks Zhe Zhang ! But I disagree that spinning up multiple threads no longer makes sense. I check that 25 log outputs were printed and each thread will only print once, so 25 outputs implies that all 25 threads produced log output.
          Hide
          zhz Zhe Zhang added a comment -

          Ahh, right, the patch verifies the method name as well as the count. Sorry that I removed it in the commit; wanted to to push the addendum so as to avoid impact on other ongoing JIRAs. After the series of FSNamesystem lock JIRAs are done, we should try to add a more holistic multi-thread test and add this one back.

          Show
          zhz Zhe Zhang added a comment - Ahh, right, the patch verifies the method name as well as the count. Sorry that I removed it in the commit; wanted to to push the addendum so as to avoid impact on other ongoing JIRAs. After the series of FSNamesystem lock JIRAs are done, we should try to add a more holistic multi-thread test and add this one back.
          Hide
          xkrogen Erik Krogen added a comment -

          Created HDFS-10829 to address concerns in testing noted in Zhe's previous comment.

          Show
          xkrogen Erik Krogen added a comment - Created HDFS-10829 to address concerns in testing noted in Zhe's previous comment.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Hi Erik Krogen,

          Thanks for adding this instrumentation. IIUC this implicitly assumes that with multiple readers, first thread to acquire the read lock will be the last thread to release it, since only the first thread to get the read lock will update readLockHeldTimeStamp. If another thread is the last to release it then readLockHeldTimeStamp.get() is going to return LONG.MAX_VALUE so the calculation will be off.

          Show
          arpitagarwal Arpit Agarwal added a comment - Hi Erik Krogen , Thanks for adding this instrumentation. IIUC this implicitly assumes that with multiple readers, first thread to acquire the read lock will be the last thread to release it, since only the first thread to get the read lock will update readLockHeldTimeStamp. If another thread is the last to release it then readLockHeldTimeStamp.get() is going to return LONG.MAX_VALUE so the calculation will be off.
          Hide
          xkrogen Erik Krogen added a comment -

          Arpit Agarwal, thank you for the concern but I do not believe that understanding is correct. getReadHoldCount is checking how many reentrant locking calls have been made. Each thread stores its own copy of readLockHeldTimestamp (note that it is a ThreadLocal, meaning each thread has its own copy) which is set on its first read lock call, and read on its final unlock call. Different threads will each monitor and record their time separately.

          Show
          xkrogen Erik Krogen added a comment - Arpit Agarwal , thank you for the concern but I do not believe that understanding is correct. getReadHoldCount is checking how many reentrant locking calls have been made. Each thread stores its own copy of readLockHeldTimestamp (note that it is a ThreadLocal , meaning each thread has its own copy) which is set on its first read lock call, and read on its final unlock call. Different threads will each monitor and record their time separately.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Got it thank you for the clarification. I got the threadLocal part but was confused about the re-entrant part.

          Show
          arpitagarwal Arpit Agarwal added a comment - Got it thank you for the clarification. I got the threadLocal part but was confused about the re-entrant part.
          Hide
          xkrogen Erik Krogen added a comment -

          Right, the getReadHoldCount vs. getReadLockCount distinction is not very obvious but very important!

          Show
          xkrogen Erik Krogen added a comment - Right, the getReadHoldCount vs. getReadLockCount distinction is not very obvious but very important!

            People

            • Assignee:
              xkrogen Erik Krogen
              Reporter:
              xkrogen Erik Krogen
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development