Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-8358

Hostname used by DataDog metric reporter is not configurable

    XMLWordPrintableJSON

Details

    Description

      The hostname used by the DataDog metric reporter to report metrics is not configurable. This can problematic if the hostname that Flink uses is different from the hostname used by the system's DataDog agent.

      For instance, in our environment we use Chef, and using the DataDog Chef Handler, certain metadata such a host roles is associated with the hostname in the DataDog service. The hostname used to submit this metadata is the name we have given the host. But as Flink picks up the default name given by EC2 to the instance, metrics submitted by Flink to DataDog using that hostname are not associated with the tags derived from Chef.

      In the Job Manager we can avoid this issue by explicitly setting the config jobmanager.rpc.address to the hostname we desire. I attempted to do the name on the Task Manager by setting the taskmanager.hostname config, but DataDog does not seem to pick up that value.

      Digging through the code it seem the DD metric reporter get the hostname from the TaskManagerMetricGroup host variable, which seems to be set from taskManagerLocation.getHostname. That in turn seems to be by calling this.inetAddress.getCanonicalHostName(), which merely perform a reverse lookup on the IP address, and then calling NetUtils.getHostnameFromFQDN on the result. The later is further problematic because it result is a non-fully qualified hostname.

      More generally, there seems to be a need to specify the hostname of a JM or TM node that be reused across Flink components.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              elevy Elias Levy
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: