Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11907

Add metric for time taken by NameNode resource check

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha4
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Add a metric to measure the time taken by the NameNode Resource Check.

      1. HDFS-11907.007.patch
        8 kB
        Chen Liang
      2. HDFS-11907.006.patch
        6 kB
        Chen Liang
      3. HDFS-11907.005.patch
        7 kB
        Chen Liang
      4. HDFS-11907.004.patch
        7 kB
        Chen Liang
      5. HDFS-11907.003.patch
        7 kB
        Chen Liang
      6. HDFS-11907.002.patch
        3 kB
        Chen Liang
      7. HDFS-11907.001.patch
        2 kB
        Chen Liang

        Activity

        Hide
        vagarychen Chen Liang added a comment - - edited

        Post v001 patch to only call availableSpace = df.getAvailable() to update the value at most every 5 seconds.

        Show
        vagarychen Chen Liang added a comment - - edited Post v001 patch to only call availableSpace = df.getAvailable() to update the value at most every 5 seconds.
        Hide
        vagarychen Chen Liang added a comment -

        Post v002 patch to merge the change from HDFS-11906, which adds a log warning if NameNode#monitorHealth is taking too long.

        Show
        vagarychen Chen Liang added a comment - Post v002 patch to merge the change from HDFS-11906 , which adds a log warning if NameNode#monitorHealth is taking too long.
        Hide
        hanishakoneru Hanisha Koneru added a comment -

        Thanks for the contribution Chen Liang. Patch v02 LGTM.

        Show
        hanishakoneru Hanisha Koneru added a comment - Thanks for the contribution Chen Liang . Patch v02 LGTM.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 23s Docker mode activated.
        +1 @author 0m 1s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 mvninstall 15m 27s trunk passed
        +1 compile 0m 57s trunk passed
        +1 checkstyle 0m 41s trunk passed
        +1 mvnsite 1m 5s trunk passed
        +1 mvneclipse 0m 15s trunk passed
        +1 findbugs 1m 57s trunk passed
        +1 javadoc 0m 46s trunk passed
        +1 mvninstall 1m 1s the patch passed
        +1 compile 0m 56s the patch passed
        +1 javac 0m 56s the patch passed
        +1 checkstyle 0m 37s the patch passed
        +1 mvnsite 1m 3s the patch passed
        +1 mvneclipse 0m 13s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 2m 0s the patch passed
        +1 javadoc 0m 39s the patch passed
        -1 unit 72m 1s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 22s The patch does not generate ASF License warnings.
        101m 49s



        Reason Tests
        Failed junit tests hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080
          hadoop.hdfs.TestBlockStoragePolicy
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure090
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HDFS-11907
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12870625/HDFS-11907.002.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 61a3ffd5fde6 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 4369690
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/19707/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19707/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19707/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 23s Docker mode activated. +1 @author 0m 1s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 15m 27s trunk passed +1 compile 0m 57s trunk passed +1 checkstyle 0m 41s trunk passed +1 mvnsite 1m 5s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 57s trunk passed +1 javadoc 0m 46s trunk passed +1 mvninstall 1m 1s the patch passed +1 compile 0m 56s the patch passed +1 javac 0m 56s the patch passed +1 checkstyle 0m 37s the patch passed +1 mvnsite 1m 3s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 0s the patch passed +1 javadoc 0m 39s the patch passed -1 unit 72m 1s hadoop-hdfs in the patch failed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 101m 49s Reason Tests Failed junit tests hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080   hadoop.hdfs.TestBlockStoragePolicy   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure090   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-11907 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12870625/HDFS-11907.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 61a3ffd5fde6 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 4369690 Default Java 1.8.0_131 findbugs v3.1.0-RC1 unit https://builds.apache.org/job/PreCommit-HDFS-Build/19707/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19707/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19707/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        arpitagarwal Arpit Agarwal added a comment - - edited

        Thanks for this improvement Chen! A few comments:

        1. We should use Time#monotonicNow instead of System#currentTimeMillis in both files. Time#monotonicNow also returns a millisecond value, but it is guaranteed to be monotonically increasing.
        2. Instead of initializing availableSpeceTimeStamp to zero, we should initialize it to (Time#monotonicNow - 5000) since 0 can be a valid timestamp returned by nanoTime.
        3. You can also log the IP address of the client that issued the request to aid debugging. It can be retrieved in an RPC call by calling Server.getRemoteIp()* Server.getIpAddr()
        4. Typo: availabeSpeceTimeStamp --> availableSpaceTimeStamp.
        5. Let's Replace 5000 and 3000 with static final ints.
        6. See if you can write an isolated unit test for NameNodeResourceChecker. e.g. the first call to isResourceAvailable should update availableSpaceTimeStamp, subsequent calls immediately should not. Then if you advance the timer (see FakeTimer) and call isResourceAvailable again, availableSpaceTimeStamp should be updated.

        *thanks for the correction Chen.

        Show
        arpitagarwal Arpit Agarwal added a comment - - edited Thanks for this improvement Chen! A few comments: We should use Time#monotonicNow instead of System#currentTimeMillis in both files. Time#monotonicNow also returns a millisecond value, but it is guaranteed to be monotonically increasing. Instead of initializing availableSpeceTimeStamp to zero, we should initialize it to (Time#monotonicNow - 5000) since 0 can be a valid timestamp returned by nanoTime. You can also log the IP address of the client that issued the request to aid debugging. It can be retrieved in an RPC call by calling Server.getRemoteIp()* Server.getIpAddr() Typo: availabeSpeceTimeStamp --> availableSpaceTimeStamp. Let's Replace 5000 and 3000 with static final ints. See if you can write an isolated unit test for NameNodeResourceChecker. e.g. the first call to isResourceAvailable should update availableSpaceTimeStamp, subsequent calls immediately should not. Then if you advance the timer (see FakeTimer) and call isResourceAvailable again, availableSpaceTimeStamp should be updated. *thanks for the correction Chen.
        Hide
        vagarychen Chen Liang added a comment - - edited

        Thanks Hanisha Koneru for the review, and thanks Arpit Agarwal for the comments! all addressed in v003 patch.

        Show
        vagarychen Chen Liang added a comment - - edited Thanks Hanisha Koneru for the review, and thanks Arpit Agarwal for the comments! all addressed in v003 patch.
        Hide
        arpitagarwal Arpit Agarwal added a comment - - edited

        Thanks for updating the patch Chen Liang. A few minor comments:

        1. Typo: availabeSpaceTimeStamp --> availableSpaceTimeStamp (also the getter).
        2. Typo: DF_STALE_THREASHOLD_MS --> DF_STALE_THRESHOLD_MS
        3. Looks like the two threshold values were swapped from v2 to v3 patch. I think the v2 values were better.
        4. getAvailabeSpaceTimeStamp can be package private. Also the new constructor.
        5. in the test case, can you add an assert for the // first call guarantees an update e.g.
              // first call guarantees an update
              long val0 = nb.getAvailabeSpaceTimeStamp();
              volume.isResourceAvailable();
              long val1 = nb.getAvailabeSpaceTimeStamp();
              assertEquals(val0 + 5000, val1);
              timer.advance(2000);
          

        Looks good otherwise!

        Show
        arpitagarwal Arpit Agarwal added a comment - - edited Thanks for updating the patch Chen Liang . A few minor comments: Typo: availabeSpaceTimeStamp --> availableSpaceTimeStamp (also the getter). Typo: DF_STALE_THREASHOLD_MS --> DF_STALE_THRESHOLD_MS Looks like the two threshold values were swapped from v2 to v3 patch. I think the v2 values were better. getAvailabeSpaceTimeStamp can be package private. Also the new constructor. in the test case, can you add an assert for the // first call guarantees an update e.g. // first call guarantees an update long val0 = nb.getAvailabeSpaceTimeStamp(); volume.isResourceAvailable(); long val1 = nb.getAvailabeSpaceTimeStamp(); assertEquals(val0 + 5000, val1); timer.advance(2000); Looks good otherwise!
        Hide
        kihwal Kihwal Lee added a comment -

        The statfs() call is not expensive, but I agree that the current calling frequency is a bit excessive. Since it is one of conditions that can trigger a NN failover, any change here will affect the failure detection latency. The increase from max 1 second latency to max 5 seems acceptable.

        Show
        kihwal Kihwal Lee added a comment - The statfs() call is not expensive, but I agree that the current calling frequency is a bit excessive. Since it is one of conditions that can trigger a NN failover, any change here will affect the failure detection latency. The increase from max 1 second latency to max 5 seems acceptable.
        Hide
        vagarychen Chen Liang added a comment -

        Thanks Arpit Agarwal and Kihwal Lee for the comments! Post v004 patch to address Arpit's comments.

        Show
        vagarychen Chen Liang added a comment - Thanks Arpit Agarwal and Kihwal Lee for the comments! Post v004 patch to address Arpit's comments.
        Hide
        andrew.wang Andrew Wang added a comment -

        Sorry for coming to this late, but is DFCachingGetSpaceUsed useful here? Seems related / similar.

        Show
        andrew.wang Andrew Wang added a comment - Sorry for coming to this late, but is DFCachingGetSpaceUsed useful here? Seems related / similar.
        Hide
        vagarychen Chen Liang added a comment -

        Thanks Andrew Wang for the comments! We prefer not to use it here though because:
        1. the change of this JIRA is about maintaining available space value, while DFCachingGetSpaceUsed is to get used space. So we will have to make further modification to this class (or create new) if we want to use it.
        2. seems that each instance of this class will use an extra background thread that periodically updates the value, which seems a bit overkill to me.

        But if you do think it is better to use DFCachingGetSpaceUsed, I will try to update with another patch.

        Show
        vagarychen Chen Liang added a comment - Thanks Andrew Wang for the comments! We prefer not to use it here though because: 1. the change of this JIRA is about maintaining available space value, while DFCachingGetSpaceUsed is to get used space . So we will have to make further modification to this class (or create new) if we want to use it. 2. seems that each instance of this class will use an extra background thread that periodically updates the value, which seems a bit overkill to me. But if you do think it is better to use DFCachingGetSpaceUsed, I will try to update with another patch.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        I am +1 on the v4 patch. I will hold off committing, pending Andrew's response.

        Show
        arpitagarwal Arpit Agarwal added a comment - I am +1 on the v4 patch. I will hold off committing, pending Andrew's response.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 36s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 16m 48s trunk passed
        +1 compile 0m 51s trunk passed
        +1 checkstyle 0m 37s trunk passed
        +1 mvnsite 0m 54s trunk passed
        +1 mvneclipse 0m 15s trunk passed
        +1 findbugs 1m 51s trunk passed
        +1 javadoc 0m 44s trunk passed
        +1 mvninstall 1m 1s the patch passed
        +1 compile 0m 56s the patch passed
        +1 javac 0m 56s the patch passed
        -0 checkstyle 0m 37s hadoop-hdfs-project/hadoop-hdfs: The patch generated 5 new + 111 unchanged - 0 fixed = 116 total (was 111)
        +1 mvnsite 1m 1s the patch passed
        +1 mvneclipse 0m 14s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 2m 0s the patch passed
        +1 javadoc 0m 38s the patch passed
        -1 unit 92m 37s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 21s The patch does not generate ASF License warnings.
        123m 29s



        Reason Tests
        Failed junit tests hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork
          hadoop.hdfs.server.namenode.TestNameNodeResourceChecker
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HDFS-11907
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12870883/HDFS-11907.004.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 9229a80e5966 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 7101477
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19733/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/19733/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19733/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19733/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 36s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 16m 48s trunk passed +1 compile 0m 51s trunk passed +1 checkstyle 0m 37s trunk passed +1 mvnsite 0m 54s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 51s trunk passed +1 javadoc 0m 44s trunk passed +1 mvninstall 1m 1s the patch passed +1 compile 0m 56s the patch passed +1 javac 0m 56s the patch passed -0 checkstyle 0m 37s hadoop-hdfs-project/hadoop-hdfs: The patch generated 5 new + 111 unchanged - 0 fixed = 116 total (was 111) +1 mvnsite 1m 1s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 0s the patch passed +1 javadoc 0m 38s the patch passed -1 unit 92m 37s hadoop-hdfs in the patch failed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 123m 29s Reason Tests Failed junit tests hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork   hadoop.hdfs.server.namenode.TestNameNodeResourceChecker   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-11907 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12870883/HDFS-11907.004.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 9229a80e5966 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 7101477 Default Java 1.8.0_131 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19733/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19733/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19733/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19733/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        andrew.wang Andrew Wang added a comment -

        Thanks for the reply Chen Liang. Just thought I'd mention the other class in case there was potential for code sharing.

        Could you expand a little on why caching is necessary here? As Kihwal said, df is normally pretty cheap, so I'm curious why we need to do this. We could also get possibly the same outcome by increasing the monitorHealth check interval from 1s to 5s.

        Show
        andrew.wang Andrew Wang added a comment - Thanks for the reply Chen Liang . Just thought I'd mention the other class in case there was potential for code sharing. Could you expand a little on why caching is necessary here? As Kihwal said, df is normally pretty cheap, so I'm curious why we need to do this. We could also get possibly the same outcome by increasing the monitorHealth check interval from 1s to 5s.
        Hide
        vagarychen Chen Liang added a comment - - edited

        Thanks Andrew Wang for the reply!

        df does seem to be a fairly cheap operation in general, but we've seen cases where we suspect it was this call being slow under certain conditions, which we are still doing analysis. About changing monitorHealth check interval, since we still want ZKFC process to try to contact NameNode frequently enough to detect process crash ASAP, we probably don't want to lower the frequency from caller's side.

        Show
        vagarychen Chen Liang added a comment - - edited Thanks Andrew Wang for the reply! df does seem to be a fairly cheap operation in general, but we've seen cases where we suspect it was this call being slow under certain conditions, which we are still doing analysis. About changing monitorHealth check interval, since we still want ZKFC process to try to contact NameNode frequently enough to detect process crash ASAP, we probably don't want to lower the frequency from caller's side.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        Hi Andrew Wang, if you have no objections I will commit Chen's v4 patch by EOD today.

        Show
        arpitagarwal Arpit Agarwal added a comment - Hi Andrew Wang , if you have no objections I will commit Chen's v4 patch by EOD today.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        Hi Chen Liang, I just noticed that TestNameNodeResourceChecker failed in the precommit run.

        Can you please take a look?

        Show
        arpitagarwal Arpit Agarwal added a comment - Hi Chen Liang , I just noticed that TestNameNodeResourceChecker failed in the precommit run. Can you please take a look?
        Hide
        andrew.wang Andrew Wang added a comment -

        Hi, sorry for the slow reply, was out on Friday and Monday,

        If we haven't confirmed the problem, I'd support adding additional logs or metrics for better debugging, but making behavior changes seems premature. Is there a metric for the df call that we can look at to confirm slowness? Other host-level statistics that we can check?

        Show
        andrew.wang Andrew Wang added a comment - Hi, sorry for the slow reply, was out on Friday and Monday, If we haven't confirmed the problem, I'd support adding additional logs or metrics for better debugging, but making behavior changes seems premature. Is there a metric for the df call that we can look at to confirm slowness? Other host-level statistics that we can check?
        Hide
        arpitagarwal Arpit Agarwal added a comment - - edited

        Hi Andrew Wang, you are right that it's expected to be a cheap call, but calling it once per second per volume seems excessive. Do you see any benefit to querying df once per second? We can make the caching interval configurable and leave the default at 1 second if you prefer.

        This is not the same as changing the health check interval as Chen mentioned. Keeping the health check interval at 1 second lets us detect process failure faster and we don't want to change that.

        Also the v4 patch has a couple of issues I missed earlier. Chen Liang can you please take a look at these?

        1. availableSpace and availableSpaceTimeStamp should be members of checkedVolume.
        2. The test case failure in TestNameNodeResourceChecker needs to be addressed. An easy fix is to check all volumes instead of trying to query a specific one.
        Show
        arpitagarwal Arpit Agarwal added a comment - - edited Hi Andrew Wang , you are right that it's expected to be a cheap call, but calling it once per second per volume seems excessive. Do you see any benefit to querying df once per second? We can make the caching interval configurable and leave the default at 1 second if you prefer. This is not the same as changing the health check interval as Chen mentioned. Keeping the health check interval at 1 second lets us detect process failure faster and we don't want to change that. Also the v4 patch has a couple of issues I missed earlier. Chen Liang can you please take a look at these? availableSpace and availableSpaceTimeStamp should be members of checkedVolume. The test case failure in TestNameNodeResourceChecker needs to be addressed. An easy fix is to check all volumes instead of trying to query a specific one.
        Hide
        andrew.wang Andrew Wang added a comment -

        I'm overall ambivalent to the df frequency, but I'd rather not add complexity if it's not necessary. I'd like to confirm the problem is the frequency of the df call before we commit a behavior change.

        Repeating my previous comment, are there additional metrics or logs we can add to help debug this for next time? I definitely support that.

        Show
        andrew.wang Andrew Wang added a comment - I'm overall ambivalent to the df frequency, but I'd rather not add complexity if it's not necessary. I'd like to confirm the problem is the frequency of the df call before we commit a behavior change. Repeating my previous comment, are there additional metrics or logs we can add to help debug this for next time? I definitely support that.
        Hide
        vagarychen Chen Liang added a comment -

        Thanks Arpit Agarwal and Andrew Wang for the follow-up. Are you -1 on making the df interval configurable Andrew?

        Show
        vagarychen Chen Liang added a comment - Thanks Arpit Agarwal and Andrew Wang for the follow-up. Are you -1 on making the df interval configurable Andrew?
        Hide
        andrew.wang Andrew Wang added a comment -

        I would appreciate some diligence investigating the root cause of the problem. I don't think we should commit a behavior change when there isn't a degree of confidence that it'll solve the stated problem.

        Based on the discussion, we (Kihwal, myself, Arpit) expect df to be a cheap call. Given that, there is not a degree of confidence that this change will speedup health checks. I'd like to hear at least a theory as to why df would be slow in a spurious way before we add complexity.

        I'm not going to burn bridges over this patch if someone really wants to commit it, but as I said before, I would much rather see a patch to add new metrics or instrumentation to root cause the problem.

        Show
        andrew.wang Andrew Wang added a comment - I would appreciate some diligence investigating the root cause of the problem. I don't think we should commit a behavior change when there isn't a degree of confidence that it'll solve the stated problem. Based on the discussion, we (Kihwal, myself, Arpit) expect df to be a cheap call. Given that, there is not a degree of confidence that this change will speedup health checks. I'd like to hear at least a theory as to why df would be slow in a spurious way before we add complexity. I'm not going to burn bridges over this patch if someone really wants to commit it, but as I said before, I would much rather see a patch to add new metrics or instrumentation to root cause the problem.
        Hide
        vagarychen Chen Liang added a comment - - edited

        Thanks Andrew Wang for the comments. Based on Andrew's comments, instead of making the change to cache the value, post v005 patch to add a metric to keep track of available resource check time.

        Show
        vagarychen Chen Liang added a comment - - edited Thanks Andrew Wang for the comments. Based on Andrew's comments, instead of making the change to cache the value, post v005 patch to add a metric to keep track of available resource check time.
        Hide
        vagarychen Chen Liang added a comment - - edited

        Thanks Arpit Agarwal for pointing out offline that, checkAvailableResources has more than one callers. Post v006 patch to move the measurement into this method to capture all the code paths.

        Show
        vagarychen Chen Liang added a comment - - edited Thanks Arpit Agarwal for pointing out offline that, checkAvailableResources has more than one callers. Post v006 patch to move the measurement into this method to capture all the code paths.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        +1 for the v6 patch. Thanks Chen.

        Andrew Wang do you have any comments on the latest patch?

        Show
        arpitagarwal Arpit Agarwal added a comment - +1 for the v6 patch. Thanks Chen. Andrew Wang do you have any comments on the latest patch?
        Hide
        andrew.wang Andrew Wang added a comment -

        LGTM +1, thanks Chen, Arpit!

        Show
        andrew.wang Andrew Wang added a comment - LGTM +1, thanks Chen, Arpit!
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 1m 1s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 14m 49s trunk passed
        +1 compile 0m 50s trunk passed
        +1 checkstyle 0m 40s trunk passed
        +1 mvnsite 0m 57s trunk passed
        +1 findbugs 1m 52s trunk passed
        +1 javadoc 0m 51s trunk passed
        +1 mvninstall 0m 53s the patch passed
        +1 compile 0m 46s the patch passed
        +1 javac 0m 46s the patch passed
        -0 checkstyle 0m 35s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 157 unchanged - 0 fixed = 158 total (was 157)
        +1 mvnsite 0m 50s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 48s the patch passed
        +1 javadoc 0m 37s the patch passed
        -1 unit 68m 31s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 20s The patch does not generate ASF License warnings.
        96m 55s



        Reason Tests
        Failed junit tests hadoop.hdfs.server.balancer.TestBalancer
          hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HDFS-11907
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872340/HDFS-11907.005.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 18ca1e176c4a 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 325163f
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19855/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/19855/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19855/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19855/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 1m 1s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 14m 49s trunk passed +1 compile 0m 50s trunk passed +1 checkstyle 0m 40s trunk passed +1 mvnsite 0m 57s trunk passed +1 findbugs 1m 52s trunk passed +1 javadoc 0m 51s trunk passed +1 mvninstall 0m 53s the patch passed +1 compile 0m 46s the patch passed +1 javac 0m 46s the patch passed -0 checkstyle 0m 35s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 157 unchanged - 0 fixed = 158 total (was 157) +1 mvnsite 0m 50s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 48s the patch passed +1 javadoc 0m 37s the patch passed -1 unit 68m 31s hadoop-hdfs in the patch failed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 96m 55s Reason Tests Failed junit tests hadoop.hdfs.server.balancer.TestBalancer   hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-11907 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872340/HDFS-11907.005.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 18ca1e176c4a 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 325163f Default Java 1.8.0_131 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19855/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19855/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19855/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19855/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 20s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 14m 5s trunk passed
        +1 compile 0m 51s trunk passed
        +1 checkstyle 0m 43s trunk passed
        +1 mvnsite 0m 57s trunk passed
        +1 findbugs 1m 43s trunk passed
        +1 javadoc 0m 41s trunk passed
        +1 mvninstall 0m 52s the patch passed
        +1 compile 0m 45s the patch passed
        +1 javac 0m 45s the patch passed
        -0 checkstyle 0m 35s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 249 unchanged - 0 fixed = 250 total (was 249)
        +1 mvnsite 0m 50s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 45s the patch passed
        +1 javadoc 0m 36s the patch passed
        -1 unit 99m 42s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 22s The patch does not generate ASF License warnings.
        126m 11s



        Reason Tests
        Failed junit tests hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140
          hadoop.hdfs.server.balancer.TestBalancer
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HDFS-11907
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872355/HDFS-11907.006.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 04b8bf714a5d 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 5578af8
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19857/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/19857/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19857/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19857/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 14m 5s trunk passed +1 compile 0m 51s trunk passed +1 checkstyle 0m 43s trunk passed +1 mvnsite 0m 57s trunk passed +1 findbugs 1m 43s trunk passed +1 javadoc 0m 41s trunk passed +1 mvninstall 0m 52s the patch passed +1 compile 0m 45s the patch passed +1 javac 0m 45s the patch passed -0 checkstyle 0m 35s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 249 unchanged - 0 fixed = 250 total (was 249) +1 mvnsite 0m 50s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 45s the patch passed +1 javadoc 0m 36s the patch passed -1 unit 99m 42s hadoop-hdfs in the patch failed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 126m 11s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140   hadoop.hdfs.server.balancer.TestBalancer   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-11907 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872355/HDFS-11907.006.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 04b8bf714a5d 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 5578af8 Default Java 1.8.0_131 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19857/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19857/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19857/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19857/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        The TestNameNodeMetrics failure looks related. I think the unit test needs to use a separate MiniDFSCluster instance, since the @Before method already started another one that is leaked by testResourceCheck.

        Also while you are fixing that, could you please add the log message that was in an earlier patch revision. i.e. if the resource check takes > 5 seconds, log a warning in FSNameSystem#checkAvailableResources?

        Show
        arpitagarwal Arpit Agarwal added a comment - The TestNameNodeMetrics failure looks related. I think the unit test needs to use a separate MiniDFSCluster instance, since the @Before method already started another one that is leaked by testResourceCheck. Also while you are fixing that, could you please add the log message that was in an earlier patch revision. i.e. if the resource check takes > 5 seconds, log a warning in FSNameSystem#checkAvailableResources ?
        Hide
        vagarychen Chen Liang added a comment -

        Thansk Arpit Agarwal for the catch! addressed in v007 patch.

        Show
        vagarychen Chen Liang added a comment - Thansk Arpit Agarwal for the catch! addressed in v007 patch.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        +1 for the v7 patch, pending Jenkins.

        There is a minor typo in the comment, it should say 5 seconds. I will fix it while committing.

        log a warning if it take >= 3 seconds

        Show
        arpitagarwal Arpit Agarwal added a comment - +1 for the v7 patch, pending Jenkins. There is a minor typo in the comment, it should say 5 seconds. I will fix it while committing. log a warning if it take >= 3 seconds
        Hide
        vagarychen Chen Liang added a comment -

        Whoops... thanks Arpit Agarwal!

        Show
        vagarychen Chen Liang added a comment - Whoops... thanks Arpit Agarwal !
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 23s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 13m 48s trunk passed
        +1 compile 0m 53s trunk passed
        +1 checkstyle 0m 46s trunk passed
        +1 mvnsite 1m 0s trunk passed
        +1 findbugs 1m 43s trunk passed
        +1 javadoc 0m 44s trunk passed
        +1 mvninstall 0m 51s the patch passed
        +1 compile 0m 49s the patch passed
        +1 javac 0m 48s the patch passed
        -0 checkstyle 0m 38s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 343 unchanged - 0 fixed = 345 total (was 343)
        +1 mvnsite 0m 58s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 58s the patch passed
        +1 javadoc 0m 40s the patch passed
        -1 unit 94m 11s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 21s The patch does not generate ASF License warnings.
        121m 5s



        Reason Tests
        Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080
          hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
          hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy
          hadoop.hdfs.server.datanode.TestDataNodeUUID



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HDFS-11907
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872746/HDFS-11907.007.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 48f04b8acaae 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 86368cc
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19885/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/19885/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19885/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19885/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 23s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 13m 48s trunk passed +1 compile 0m 53s trunk passed +1 checkstyle 0m 46s trunk passed +1 mvnsite 1m 0s trunk passed +1 findbugs 1m 43s trunk passed +1 javadoc 0m 44s trunk passed +1 mvninstall 0m 51s the patch passed +1 compile 0m 49s the patch passed +1 javac 0m 48s the patch passed -0 checkstyle 0m 38s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 343 unchanged - 0 fixed = 345 total (was 343) +1 mvnsite 0m 58s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 58s the patch passed +1 javadoc 0m 40s the patch passed -1 unit 94m 11s hadoop-hdfs in the patch failed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 121m 5s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy   hadoop.hdfs.server.datanode.TestDataNodeUUID Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HDFS-11907 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872746/HDFS-11907.007.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 48f04b8acaae 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 86368cc Default Java 1.8.0_131 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19885/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19885/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19885/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19885/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        vagarychen Chen Liang added a comment -

        The failed test are unrelated.

        Show
        vagarychen Chen Liang added a comment - The failed test are unrelated.
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        I've committed this to trunk and branch-2 with the following additional delta to fix the checkstyle issues in the test case:

        --- a/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
        +++ b/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
        @@ -72,14 +72,12 @@
         import org.apache.hadoop.hdfs.tools.NNHAServiceTarget;
         import org.apache.hadoop.metrics2.MetricsRecordBuilder;
         import org.apache.hadoop.metrics2.MetricsSource;
        -import org.apache.hadoop.metrics2.annotation.Metric;
         import org.apache.hadoop.metrics2.lib.DefaultMetricsSystem;
         import org.apache.hadoop.test.GenericTestUtils;
         import org.apache.hadoop.test.MetricsAsserts;
         import org.apache.log4j.Level;
         import org.junit.After;
         import org.junit.Before;
        -import org.junit.Ignore;
        
        Show
        arpitagarwal Arpit Agarwal added a comment - I've committed this to trunk and branch-2 with the following additional delta to fix the checkstyle issues in the test case: --- a/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java +++ b/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java @@ -72,14 +72,12 @@ import org.apache.hadoop.hdfs.tools.NNHAServiceTarget; import org.apache.hadoop.metrics2.MetricsRecordBuilder; import org.apache.hadoop.metrics2.MetricsSource; - import org.apache.hadoop.metrics2.annotation.Metric; import org.apache.hadoop.metrics2.lib.DefaultMetricsSystem; import org.apache.hadoop.test.GenericTestUtils; import org.apache.hadoop.test.MetricsAsserts; import org.apache.log4j.Level; import org.junit.After; import org.junit.Before; - import org.junit.Ignore;
        Hide
        arpitagarwal Arpit Agarwal added a comment -

        Thanks for the contribution Chen Liang. Thanks Kihwal and Andrew for the discussion.

        Show
        arpitagarwal Arpit Agarwal added a comment - Thanks for the contribution Chen Liang . Thanks Kihwal and Andrew for the discussion.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11859 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11859/)
        HDFS-11907. Add metric for time taken by NameNode resource check. (arp: rev 3f0a727f7585147207f2a011816434d0002b5284)

        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11859 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11859/ ) HDFS-11907 . Add metric for time taken by NameNode resource check. (arp: rev 3f0a727f7585147207f2a011816434d0002b5284) (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java

          People

          • Assignee:
            vagarychen Chen Liang
            Reporter:
            vagarychen Chen Liang
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development