Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-11127

Make metrics query service establish connection to JobManager

    XMLWordPrintableJSON

    Details

      Description

      As part of FLINK-10247, the internal metrics query service has been separated into its own actor system. Before this change, the JobManager (JM) queried TaskManager (TM) metrics via the TM actor. Now, the JM needs to establish a separate connection to the TM metrics query service actor.

      In the context of Kubernetes, this is problematic as the JM will typically not be able to resolve the TMs by name, resulting in warnings as follows:

      2018-12-11 08:32:33,962 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink-metrics@flink-task-manager-64b868487c-x9l4b:39183] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink-metrics@flink-task-manager-64b868487c-x9l4b:39183]] Caused by: [flink-task-manager-64b868487c-x9l4b: Name does not resolve]
      

      In order to expose the TMs by name in Kubernetes, users require a service for each TM instance which is not practical.

      This currently results in the web UI not being to display some basic metrics about number of sent records. You can reproduce this by following the READMEs in flink-container/kubernetes.

      This worked before, because the JM is typically exposed via a service with a known name and the TMs establish the connection to it which the metrics query service piggybacked on.

      A potential solution to this might be to let the query service connect to the JM similar to how the TMs register.

      I tagged this ticket as an improvement, but in the context of Kubernetes I would consider this to be a bug.

        Attachments

          Issue Links

          There are no Sub-Tasks for this issue.

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                uce Ufuk Celebi
              • Votes:
                8 Vote for this issue
                Watchers:
                20 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m