Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-6129

MetricRegistry does not stop query actor

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.0
    • Fix Version/s: 1.3.0
    • Component/s: Metrics
    • Labels:
      None

      Description

      The MetricRegistry does not properly close the started query actor upon shutdown. This can be a resource leak.

        Issue Links

          Activity

          Hide
          Zentol Chesnay Schepler added a comment -

          I thought all actors are properly shutdown when the actorSystem is shutdown?

          Show
          Zentol Chesnay Schepler added a comment - I thought all actors are properly shutdown when the actorSystem is shutdown?
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user tillrohrmann opened a pull request:

          https://github.com/apache/flink/pull/3573

          FLINK-6129 [metrics] Stop query actor of MetricRegistry

          This PR properly shuts down the query actor of the MetricRegistry upon shut down.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/tillrohrmann/flink shutdownQueryActor

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3573.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3573


          commit 02d30ae9a8837452a14595fd6b32ce6a11240da4
          Author: Till Rohrmann <trohrmann@apache.org>
          Date: 2017-03-20T13:55:30Z

          FLINK-6129 [metrics] Stop query actor of MetricRegistry

          This PR properly shuts down the query actor of the MetricRegistry upon shut down.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/3573 FLINK-6129 [metrics] Stop query actor of MetricRegistry This PR properly shuts down the query actor of the MetricRegistry upon shut down. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink shutdownQueryActor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3573.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3573 commit 02d30ae9a8837452a14595fd6b32ce6a11240da4 Author: Till Rohrmann <trohrmann@apache.org> Date: 2017-03-20T13:55:30Z FLINK-6129 [metrics] Stop query actor of MetricRegistry This PR properly shuts down the query actor of the MetricRegistry upon shut down.
          Hide
          till.rohrmann Till Rohrmann added a comment -

          Chesnay Schepler they are, but who says that the underlying ActorSystem is shut down when the MetricRegistry is shut down?

          Show
          till.rohrmann Till Rohrmann added a comment - Chesnay Schepler they are, but who says that the underlying ActorSystem is shut down when the MetricRegistry is shut down?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zentol commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3573#discussion_r106959941

          — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistry.java —
          @@ -198,6 +204,14 @@ public boolean isShutdown() {

          • Shuts down this registry and the associated {@link MetricReporter}

            .
            */
            public void shutdown() {
            + Future<Boolean> stopFuture = null;

              • End diff –

          why do you split the additions in 2 blocks? Could you not merge them with the sole condition that the MQS is not null?

          Show
          githubbot ASF GitHub Bot added a comment - Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/3573#discussion_r106959941 — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistry.java — @@ -198,6 +204,14 @@ public boolean isShutdown() { Shuts down this registry and the associated {@link MetricReporter} . */ public void shutdown() { + Future<Boolean> stopFuture = null; End diff – why do you split the additions in 2 blocks? Could you not merge them with the sole condition that the MQS is not null?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3573#discussion_r107124400

          — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistry.java —
          @@ -198,6 +204,14 @@ public boolean isShutdown() {

          • Shuts down this registry and the associated {@link MetricReporter}

            .
            */
            public void shutdown() {
            + Future<Boolean> stopFuture = null;

              • End diff –

          Because that way we can exploit the asynchronous stop operation of the actors without blocking the stopping of the reporters and the executors.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/3573#discussion_r107124400 — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistry.java — @@ -198,6 +204,14 @@ public boolean isShutdown() { Shuts down this registry and the associated {@link MetricReporter} . */ public void shutdown() { + Future<Boolean> stopFuture = null; End diff – Because that way we can exploit the asynchronous stop operation of the actors without blocking the stopping of the reporters and the executors.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zentol commented on the issue:

          https://github.com/apache/flink/pull/3573

          Since the MQS is explicitly started i was wondering whether it should also be stopped explicitly via ```stopMetricQueryService()```. The ```shutdown()``` method is right now the counterpart to the MR constructor, which doesn't start the MQS.

          Show
          githubbot ASF GitHub Bot added a comment - Github user zentol commented on the issue: https://github.com/apache/flink/pull/3573 Since the MQS is explicitly started i was wondering whether it should also be stopped explicitly via ```stopMetricQueryService()```. The ```shutdown()``` method is right now the counterpart to the MR constructor, which doesn't start the MQS.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zentol commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3573#discussion_r107615168

          — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistry.java —
          @@ -209,14 +223,29 @@ public void shutdown()

          { reporters = null; }

          shutdownExecutor();
          +
          + if (stopFuture != null) {
          — End diff –

          if the MQS is started while a shutdown is in progress it might not be cleaned up. What do you think about adding a lock here and in startMetricQueryService?

          Show
          githubbot ASF GitHub Bot added a comment - Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/3573#discussion_r107615168 — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistry.java — @@ -209,14 +223,29 @@ public void shutdown() { reporters = null; } shutdownExecutor(); + + if (stopFuture != null) { — End diff – if the MQS is started while a shutdown is in progress it might not be cleaned up. What do you think about adding a lock here and in startMetricQueryService?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3573#discussion_r107639246

          — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistry.java —
          @@ -209,14 +223,29 @@ public void shutdown()

          { reporters = null; }

          shutdownExecutor();
          +
          + if (stopFuture != null) {
          — End diff –

          That is a good point. Will do that.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/3573#discussion_r107639246 — Diff: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistry.java — @@ -209,14 +223,29 @@ public void shutdown() { reporters = null; } shutdownExecutor(); + + if (stopFuture != null) { — End diff – That is a good point. Will do that.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann commented on the issue:

          https://github.com/apache/flink/pull/3573

          Thanks for the review @zentol.

          I think it's ok to close the query service in the `shutdown` method. If it later should turn out to be a disadvantage, we can still change it.

          After addressing your comments, I'll merge the PR.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/3573 Thanks for the review @zentol. I think it's ok to close the query service in the `shutdown` method. If it later should turn out to be a disadvantage, we can still change it. After addressing your comments, I'll merge the PR.
          Hide
          till.rohrmann Till Rohrmann added a comment -

          Fixed via 8319a457d9adee310ef64905709c03ca2f2afd61

          Show
          till.rohrmann Till Rohrmann added a comment - Fixed via 8319a457d9adee310ef64905709c03ca2f2afd61
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann closed the pull request at:

          https://github.com/apache/flink/pull/3573

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann closed the pull request at: https://github.com/apache/flink/pull/3573

            People

            • Assignee:
              till.rohrmann Till Rohrmann
              Reporter:
              till.rohrmann Till Rohrmann
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development