Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11615

FSNamesystemLock metrics can be inaccurate due to millisecond precision

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.4
    • Fix Version/s: 2.9.0, 2.7.4, 3.0.0-alpha4, 2.8.2
    • Component/s: hdfs
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently the FSNamesystemLock metrics created in HDFS-10872 track the lock hold time using Timer.monotonicNow(), which has millisecond-level precision. However, many of these operations hold the lock for less than a millisecond, making these metrics inaccurate. We should instead use System.nanoTime() for higher accuracy.

      1. HDFS-11615.000.patch
        9 kB
        Erik Krogen
      2. HDFS-11615.001.patch
        15 kB
        Erik Krogen

        Issue Links

          Activity

          Hide
          xkrogen Erik Krogen added a comment -

          Attaching v000 patch which exports these values in microseconds, which is more relevant at the timescale at which these operations occur. Some quick tests indicate the cheapest operations fall in the tens-of-microseconds range so nanosecond precision seems unnecessary. Added a new unit test and tested on a minicluster.

          Show
          xkrogen Erik Krogen added a comment - Attaching v000 patch which exports these values in microseconds, which is more relevant at the timescale at which these operations occur. Some quick tests indicate the cheapest operations fall in the tens-of-microseconds range so nanosecond precision seems unnecessary. Added a new unit test and tested on a minicluster.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 24s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 15m 37s trunk passed
          +1 compile 0m 55s trunk passed
          +1 checkstyle 0m 38s trunk passed
          +1 mvnsite 0m 58s trunk passed
          +1 mvneclipse 0m 16s trunk passed
          +1 findbugs 1m 55s trunk passed
          +1 javadoc 0m 42s trunk passed
          +1 mvninstall 0m 53s the patch passed
          +1 compile 0m 52s the patch passed
          +1 javac 0m 52s the patch passed
          +1 checkstyle 0m 35s the patch passed
          +1 mvnsite 0m 56s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 2s The patch has no ill-formed XML file.
          +1 findbugs 2m 5s the patch passed
          +1 javadoc 0m 42s the patch passed
          -1 unit 76m 37s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 22s The patch does not generate ASF License warnings.
          106m 27s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits
            hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:612578f
          JIRA Issue HDFS-11615
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12862727/HDFS-11615.000.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
          uname Linux ce206893e4e1 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 14a3990
          Default Java 1.8.0_121
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/19033/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19033/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19033/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 24s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 15m 37s trunk passed +1 compile 0m 55s trunk passed +1 checkstyle 0m 38s trunk passed +1 mvnsite 0m 58s trunk passed +1 mvneclipse 0m 16s trunk passed +1 findbugs 1m 55s trunk passed +1 javadoc 0m 42s trunk passed +1 mvninstall 0m 53s the patch passed +1 compile 0m 52s the patch passed +1 javac 0m 52s the patch passed +1 checkstyle 0m 35s the patch passed +1 mvnsite 0m 56s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 2s The patch has no ill-formed XML file. +1 findbugs 2m 5s the patch passed +1 javadoc 0m 42s the patch passed -1 unit 76m 37s hadoop-hdfs in the patch failed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 106m 27s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits   hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl Subsystem Report/Notes Docker Image:yetus/hadoop:612578f JIRA Issue HDFS-11615 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12862727/HDFS-11615.000.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux ce206893e4e1 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 14a3990 Default Java 1.8.0_121 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/19033/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19033/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19033/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          andrew.wang Andrew Wang added a comment -

          If we have nanosecond precision, is there a reason to use microsecond rather than nanosecond precision?

          Show
          andrew.wang Andrew Wang added a comment - If we have nanosecond precision, is there a reason to use microsecond rather than nanosecond precision?
          Hide
          xkrogen Erik Krogen added a comment -

          I was thinking that the numbers are a little easier to parse/understand (for a human) and microseconds are more indicative of the time scale for these operations but I don't have a strong preference either way; I'm happy to adjust if there are other opinions.

          Show
          xkrogen Erik Krogen added a comment - I was thinking that the numbers are a little easier to parse/understand (for a human) and microseconds are more indicative of the time scale for these operations but I don't have a strong preference either way; I'm happy to adjust if there are other opinions.
          Hide
          andrew.wang Andrew Wang added a comment -

          These metrics are normally consumed by a metrics system and regraphed, so I don't think human parsing is that important.

          Related note, I like to put the unit (e.g. "Nanos") into the metric and variable name, since otherwise I have to look it up each time. Possible to do this here too?

          Show
          andrew.wang Andrew Wang added a comment - These metrics are normally consumed by a metrics system and regraphed, so I don't think human parsing is that important. Related note, I like to put the unit (e.g. "Nanos") into the metric and variable name, since otherwise I have to look it up each time. Possible to do this here too?
          Hide
          xkrogen Erik Krogen added a comment -

          Fair enough, I will change it to nanoseconds. I can put it in the variable names, and I think that's a good idea - the mix of nano-level and milli-level in the class is a little confusing.

          The metric name is harder since MutableRate is used so you supply a base name and then the MutableRate applies the suffix of AvgTime and NumOps, so it would look like FSNWriteLockOperationNameNanosAvgTime and FSNWriteLockOperationNameNanosNumOps which is a little awkward... I could potentially create an additional constructor for MutableRate which allows you to specify a time unit. Any thoughts, Andrew Wang?

          Show
          xkrogen Erik Krogen added a comment - Fair enough, I will change it to nanoseconds. I can put it in the variable names, and I think that's a good idea - the mix of nano-level and milli-level in the class is a little confusing. The metric name is harder since MutableRate is used so you supply a base name and then the MutableRate applies the suffix of AvgTime and NumOps , so it would look like FSNWriteLockOperationNameNanosAvgTime and FSNWriteLockOperationNameNanosNumOps which is a little awkward... I could potentially create an additional constructor for MutableRate which allows you to specify a time unit. Any thoughts, Andrew Wang ?
          Hide
          andrew.wang Andrew Wang added a comment -

          Hi Erik,

          What's your proposal for new names? I'm guessing that the monitoring tools out there already understand a Hadoop MutableRate, so changing the names (even if it's awkward) will mean more work for them.

          Chances are these monitoring tools also support displaying a separate human-friendly name, so again it might not be important for the raw JMX output to be very human readable.

          Show
          andrew.wang Andrew Wang added a comment - Hi Erik, What's your proposal for new names? I'm guessing that the monitoring tools out there already understand a Hadoop MutableRate, so changing the names (even if it's awkward) will mean more work for them. Chances are these monitoring tools also support displaying a separate human-friendly name, so again it might not be important for the raw JMX output to be very human readable.
          Hide
          xkrogen Erik Krogen added a comment -

          I would have suggested just adding an optional time unit parameter which would allow you to specify that a rate is e.g. measured in nanos to be output as "AvgTimeNanos"/"NumOps" but leave things as "AvgTime"/"NumOps" by default if no unit is specified. I see your point about automated tooling, though.

          "XxxNanosAvgTime" is reasonable but I'm hesitant to emit a metric called "XxxNanosNumOps"...

          Show
          xkrogen Erik Krogen added a comment - I would have suggested just adding an optional time unit parameter which would allow you to specify that a rate is e.g. measured in nanos to be output as "AvgTimeNanos"/"NumOps" but leave things as "AvgTime"/"NumOps" by default if no unit is specified. I see your point about automated tooling, though. "XxxNanosAvgTime" is reasonable but I'm hesitant to emit a metric called "XxxNanosNumOps"...
          Hide
          zhz Zhe Zhang added a comment -

          "XxxNanosAvgTime" is reasonable but I'm hesitant to emit a metric called "XxxNanosNumOps"...

          DataNode already has multiple metrics named with that convention. E.g. FlushNanosAvgTime. So I guess that makes it less awkward

          Show
          zhz Zhe Zhang added a comment - "XxxNanosAvgTime" is reasonable but I'm hesitant to emit a metric called "XxxNanosNumOps"... DataNode already has multiple metrics named with that convention. E.g. FlushNanosAvgTime . So I guess that makes it less awkward
          Hide
          xkrogen Erik Krogen added a comment -

          Ah, thanks Zhe Zhang, didn't realize there was precedent for this already. Sounds good then! Uploading v001 patch which adds Nanos to the name and marks all of the variables in the class with their unit.

          Show
          xkrogen Erik Krogen added a comment - Ah, thanks Zhe Zhang , didn't realize there was precedent for this already. Sounds good then! Uploading v001 patch which adds Nanos to the name and marks all of the variables in the class with their unit.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 13m 11s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 13m 44s trunk passed
          +1 compile 0m 48s trunk passed
          +1 checkstyle 0m 36s trunk passed
          +1 mvnsite 0m 54s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 1m 46s trunk passed
          +1 javadoc 0m 47s trunk passed
          +1 mvninstall 1m 1s the patch passed
          +1 compile 1m 1s the patch passed
          +1 javac 1m 1s the patch passed
          +1 checkstyle 0m 42s the patch passed
          +1 mvnsite 1m 7s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          +1 findbugs 2m 13s the patch passed
          +1 javadoc 0m 43s the patch passed
          -1 unit 74m 53s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 24s The patch does not generate ASF License warnings.
          115m 55s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:612578f
          JIRA Issue HDFS-11615
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12863174/HDFS-11615.001.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
          uname Linux 8779bbd68de0 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 0cab572
          Default Java 1.8.0_121
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/19072/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19072/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19072/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 13m 11s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 13m 44s trunk passed +1 compile 0m 48s trunk passed +1 checkstyle 0m 36s trunk passed +1 mvnsite 0m 54s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 46s trunk passed +1 javadoc 0m 47s trunk passed +1 mvninstall 1m 1s the patch passed +1 compile 1m 1s the patch passed +1 javac 1m 1s the patch passed +1 checkstyle 0m 42s the patch passed +1 mvnsite 1m 7s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 2m 13s the patch passed +1 javadoc 0m 43s the patch passed -1 unit 74m 53s hadoop-hdfs in the patch failed. +1 asflicense 0m 24s The patch does not generate ASF License warnings. 115m 55s Reason Tests Failed junit tests hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting Subsystem Report/Notes Docker Image:yetus/hadoop:612578f JIRA Issue HDFS-11615 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12863174/HDFS-11615.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux 8779bbd68de0 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 0cab572 Default Java 1.8.0_121 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/19072/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19072/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19072/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          zhz Zhe Zhang added a comment -

          Thanks Erik. I finally got my trunk build fixed. I verified output from MiniHadoopClusterManager and it LGTM.

          +1 on the patch. I'm committing to trunk~branch-2.7.

          Show
          zhz Zhe Zhang added a comment - Thanks Erik. I finally got my trunk build fixed. I verified output from MiniHadoopClusterManager and it LGTM. +1 on the patch. I'm committing to trunk~branch-2.7.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11595 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11595/)
          HDFS-11615. FSNamesystemLock metrics can be inaccurate due to (zhz: rev ad49098eb324e238d97db68d7239ed2c4d84afa0)

          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystemLock.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystemLock.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11595 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11595/ ) HDFS-11615 . FSNamesystemLock metrics can be inaccurate due to (zhz: rev ad49098eb324e238d97db68d7239ed2c4d84afa0) (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystemLock.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystemLock.java
          Hide
          zhz Zhe Zhang added a comment -

          Committed the trunk~branch-2.7. Thanks Erik Krogen for the work and Andrew Wang for sharing the feedback.

          Show
          zhz Zhe Zhang added a comment - Committed the trunk~branch-2.7. Thanks Erik Krogen for the work and Andrew Wang for sharing the feedback.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          2.8.1 became a security release. Moving fix-version to 2.8.2 after the fact.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - 2.8.1 became a security release. Moving fix-version to 2.8.2 after the fact.

            People

            • Assignee:
              xkrogen Erik Krogen
              Reporter:
              xkrogen Erik Krogen
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development