Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10798

Make the threshold of reporting FSNamesystem lock contention configurable

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Currently FSNamesystem#WRITELOCK_REPORTING_THRESHOLD is set at 1 second. In a busy system a lower overhead might be desired. In other scenarios, more aggressive reporting might be desired. We should make the threshold configurable.

      1. HDFS-10789.002.patch
        7 kB
        Erik Krogen
      2. HDFS-10789.001.patch
        7 kB
        Erik Krogen

        Issue Links

          Activity

          Hide
          xkrogen Erik Krogen added a comment -

          Add configuration 'dfs.namenode.write-lock.reporting.threshold.ms' to configure this value.

          Show
          xkrogen Erik Krogen added a comment - Add configuration 'dfs.namenode.write-lock.reporting.threshold.ms' to configure this value.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 6m 57s trunk passed
          +1 compile 0m 49s trunk passed
          +1 checkstyle 0m 37s trunk passed
          +1 mvnsite 0m 57s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 1m 46s trunk passed
          +1 javadoc 0m 58s trunk passed
          +1 mvninstall 0m 47s the patch passed
          +1 compile 0m 47s the patch passed
          +1 javac 0m 47s the patch passed
          -0 checkstyle 0m 32s hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 581 unchanged - 1 fixed = 585 total (was 582)
          +1 mvnsite 0m 55s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          +1 findbugs 1m 54s the patch passed
          +1 javadoc 0m 54s the patch passed
          -1 unit 77m 32s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 19s The patch does not generate ASF License warnings.
          97m 50s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.ha.TestEditLogTailer
            hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations
            hadoop.hdfs.qjournal.client.TestQuorumJournalManager
            hadoop.hdfs.TestEncryptionZones
            hadoop.hdfs.server.namenode.TestCacheDirectives



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10798
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12825572/HDFS-10789.001.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
          uname Linux 68ea9baea93d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 81485db
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16545/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16545/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16545/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16545/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 57s trunk passed +1 compile 0m 49s trunk passed +1 checkstyle 0m 37s trunk passed +1 mvnsite 0m 57s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 46s trunk passed +1 javadoc 0m 58s trunk passed +1 mvninstall 0m 47s the patch passed +1 compile 0m 47s the patch passed +1 javac 0m 47s the patch passed -0 checkstyle 0m 32s hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 581 unchanged - 1 fixed = 585 total (was 582) +1 mvnsite 0m 55s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 1m 54s the patch passed +1 javadoc 0m 54s the patch passed -1 unit 77m 32s hadoop-hdfs in the patch failed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 97m 50s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.ha.TestEditLogTailer   hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations   hadoop.hdfs.qjournal.client.TestQuorumJournalManager   hadoop.hdfs.TestEncryptionZones   hadoop.hdfs.server.namenode.TestCacheDirectives Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10798 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12825572/HDFS-10789.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux 68ea9baea93d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 81485db Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16545/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/16545/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16545/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16545/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          zhz Zhe Zhang added a comment -

          Thanks Erik Krogen for the work. Path LGTM overall. One nit:

          1. dfs.namenode.write-lock.reporting.threshold.ms better to be dfs.namenode.write-lock-reporting-threshold-ms. I don't think we have formally documented the convention of naming config keys. But intuitively, . indicates a hierarchy.
          2. Could you also verify the reported test failures?

          +1 pending the above.

          Show
          zhz Zhe Zhang added a comment - Thanks Erik Krogen for the work. Path LGTM overall. One nit: dfs.namenode.write-lock.reporting.threshold.ms better to be dfs.namenode.write-lock-reporting-threshold-ms . I don't think we have formally documented the convention of naming config keys. But intuitively, . indicates a hierarchy. Could you also verify the reported test failures? +1 pending the above.
          Hide
          xkrogen Erik Krogen added a comment -

          1. Fair. It seems many keys in DFSConfigKeys follow the "replace spaces with periods" convention but I agree that the hierarchy convention is more intuitive. I have submitted a new patch.
          2. All of these tests pass locally. They do not seem related to this change.

          Show
          xkrogen Erik Krogen added a comment - 1. Fair. It seems many keys in DFSConfigKeys follow the "replace spaces with periods" convention but I agree that the hierarchy convention is more intuitive. I have submitted a new patch. 2. All of these tests pass locally. They do not seem related to this change.
          Hide
          zhz Zhe Zhang added a comment -

          Thanks Erik. +1 on the patch. Will commit once Jenkins returns.

          Show
          zhz Zhe Zhang added a comment - Thanks Erik. +1 on the patch. Will commit once Jenkins returns.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 6m 56s trunk passed
          +1 compile 0m 47s trunk passed
          +1 checkstyle 0m 32s trunk passed
          +1 mvnsite 0m 52s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 45s trunk passed
          +1 javadoc 0m 54s trunk passed
          +1 mvninstall 0m 46s the patch passed
          +1 compile 0m 43s the patch passed
          +1 javac 0m 43s the patch passed
          -0 checkstyle 0m 30s hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 582 unchanged - 1 fixed = 586 total (was 583)
          +1 mvnsite 0m 49s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          +1 findbugs 1m 47s the patch passed
          +1 javadoc 0m 52s the patch passed
          -1 unit 71m 40s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          91m 4s



          Reason Tests
          Failed junit tests hadoop.hdfs.TestRollingUpgrade



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-10798
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12825682/HDFS-10789.002.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
          uname Linux 8e3ff374e751 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 9ef632f
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16550/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/16550/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16550/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16550/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 56s trunk passed +1 compile 0m 47s trunk passed +1 checkstyle 0m 32s trunk passed +1 mvnsite 0m 52s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 45s trunk passed +1 javadoc 0m 54s trunk passed +1 mvninstall 0m 46s the patch passed +1 compile 0m 43s the patch passed +1 javac 0m 43s the patch passed -0 checkstyle 0m 30s hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 582 unchanged - 1 fixed = 586 total (was 583) +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 1m 47s the patch passed +1 javadoc 0m 52s the patch passed -1 unit 71m 40s hadoop-hdfs in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 91m 4s Reason Tests Failed junit tests hadoop.hdfs.TestRollingUpgrade Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-10798 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12825682/HDFS-10789.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux 8e3ff374e751 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 9ef632f Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16550/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/16550/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16550/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16550/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          zhz Zhe Zhang added a comment -

          I'll commit the patch to trunk~branch-2.7 shortly. Thanks Erik Krogen for the contribution.

          Show
          zhz Zhe Zhang added a comment - I'll commit the patch to trunk~branch-2.7 shortly. Thanks Erik Krogen for the contribution.
          Hide
          xkrogen Erik Krogen added a comment -

          Thanks Zhe Zhang!

          Show
          xkrogen Erik Krogen added a comment - Thanks Zhe Zhang !
          Hide
          zhz Zhe Zhang added a comment -

          Committed to trunk~branch-2.7.

          Show
          zhz Zhe Zhang added a comment - Committed to trunk~branch-2.7.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10358 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10358/)
          HDFS-10798. Make the threshold of reporting FSNamesystem lock contention (zhz: rev 407b519fb14f79f19ebc4fbdf08204336a7acf77)

          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10358 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10358/ ) HDFS-10798 . Make the threshold of reporting FSNamesystem lock contention (zhz: rev 407b519fb14f79f19ebc4fbdf08204336a7acf77) (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java
          Hide
          andrew.wang Andrew Wang added a comment -

          Hi folks, I was looking through the lock logging patches and had a question. I noticed that the read lock and write lock threshold defaults are set differently:

            public static final String  DFS_NAMENODE_WRITE_LOCK_REPORTING_THRESHOLD_MS_KEY =
                "dfs.namenode.write-lock-reporting-threshold-ms";
            public static final long    DFS_NAMENODE_WRITE_LOCK_REPORTING_THRESHOLD_MS_DEFAULT = 1000L;
            public static final String  DFS_NAMENODE_READ_LOCK_REPORTING_THRESHOLD_MS_KEY =
                "dfs.namenode.read-lock-reporting-threshold-ms";
            public static final long    DFS_NAMENODE_READ_LOCK_REPORTING_THRESHOLD_MS_DEFAULT = 5000L;
          

          And I saw this JIRA that made the write lock threshold configurable because it was spamming a lot.

          Do you think it makes sense to change the write lock default to also be 5000 ms, same as the read lock? I can file a JIRA for this.

          Show
          andrew.wang Andrew Wang added a comment - Hi folks, I was looking through the lock logging patches and had a question. I noticed that the read lock and write lock threshold defaults are set differently: public static final String DFS_NAMENODE_WRITE_LOCK_REPORTING_THRESHOLD_MS_KEY = "dfs.namenode.write-lock-reporting-threshold-ms" ; public static final long DFS_NAMENODE_WRITE_LOCK_REPORTING_THRESHOLD_MS_DEFAULT = 1000L; public static final String DFS_NAMENODE_READ_LOCK_REPORTING_THRESHOLD_MS_KEY = "dfs.namenode.read-lock-reporting-threshold-ms" ; public static final long DFS_NAMENODE_READ_LOCK_REPORTING_THRESHOLD_MS_DEFAULT = 5000L; And I saw this JIRA that made the write lock threshold configurable because it was spamming a lot. Do you think it makes sense to change the write lock default to also be 5000 ms, same as the read lock? I can file a JIRA for this.
          Hide
          xkrogen Erik Krogen added a comment -

          My understanding of the intuition for why they would be different is that a long read lock hold is less serious than a long write lock hold, since other reads can still proceed. Also long reads may be more expected given listStatus and contentSummary type commands. It is also typical to have a higher percentage of operations be read, so potential spam volume may be heavier for read locks rather than write locks.

          That being said, 5000ms may still be a more sensible default, erring on the side of lower overhead unless an operator actually uses these log statements in which case they can tune the threshold themselves. Zhe Zhang, do you have any opinion?

          Show
          xkrogen Erik Krogen added a comment - My understanding of the intuition for why they would be different is that a long read lock hold is less serious than a long write lock hold, since other reads can still proceed. Also long reads may be more expected given listStatus and contentSummary type commands. It is also typical to have a higher percentage of operations be read, so potential spam volume may be heavier for read locks rather than write locks. That being said, 5000ms may still be a more sensible default, erring on the side of lower overhead unless an operator actually uses these log statements in which case they can tune the threshold themselves. Zhe Zhang , do you have any opinion?
          Hide
          andrew.wang Andrew Wang added a comment -

          Thanks for the background Erik. I filed a simple patch on HDFS-11466 to bump this default to 5000ms, we can continue the discussion there.

          Show
          andrew.wang Andrew Wang added a comment - Thanks for the background Erik. I filed a simple patch on HDFS-11466 to bump this default to 5000ms, we can continue the discussion there.
          Hide
          zhz Zhe Zhang added a comment -

          5000ms default for writeLock sounds OK. Thanks for the discussion Erik and Andrew.

          Show
          zhz Zhe Zhang added a comment - 5000ms default for writeLock sounds OK. Thanks for the discussion Erik and Andrew.

            People

            • Assignee:
              xkrogen Erik Krogen
              Reporter:
              zhz Zhe Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development