Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-28171

Adjust Job and Task manager port definitions to work with Istio+mTLS

    XMLWordPrintableJSON

Details

    Description

      Hello,

       

      We are launching Flink deployments using the Flink Kubernetes Operator on a Kubernetes cluster with Istio and mTLS enabled.

       

      We found that the TaskManager is unable to communicate with the JobManager on the jobmanager-rpc port:

       

      2022-06-15 15:25:40,508 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system akka.tcp://[flink@amf-events-to-inference-and-central.nwdaf-edge has failed, address is now gated for [50] ms. Reason: [Association failed with akka.tcp://[flink@amf-events-to-inference-and-central.nwdaf-edge] Caused by: [The remote system explicitly disassociated (reason unknown).]

       

      The reason for the issue is that the JobManager service port definitions are not following the Istio guidelines https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/ (see example below).

       

      There was also an email discussion around this topic in the users mailing group under the subject "Flink Kubernetes Operator with K8S + Istio + mTLS - port definitions".

      With the help of the community, we were able to work around the issue but it was very hard and forced us to skip Istio proxy which is not ideal.

       

      We would like you to consider changing the default port definitions, either

      1. Rename the ports – I understand it is Istio specific guideline but maybe it is better to at least be aligned with one (popular) vendor guideline instead of none at all.
      2. Add the “appProtocol” property[1] that is not specific to any vendor but requires Kubernetes >= 1.19 where it was introduced as beta and moved to stable in >= 1.20. The option to add appProtocol property was added only in https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0 with #3570.
      3. Or allow a way to override the defaults.

       

      https://kubernetes.io/docs/concepts/services-networking/_print/#application-protocol

       

       

      # k get service inference-results-to-analytics-engine -o yaml

      apiVersion: v1

      kind: Service

      ...

      spec:

        clusterIP: None

        ports:

        - name: jobmanager-rpc # should start with “tcp-“ or add "appProtocol" property

          port: 6123

          protocol: TCP

          targetPort: 6123

        - name: blobserver # should start with "tcp-" or add "appProtocol" property

          port: 6124

          protocol: TCP

          targetPort: 6124

      ...

      Attachments

        Issue Links

          Activity

            People

              elishamoshe Moshe Elisha
              elishamoshe Moshe Elisha
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: