Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42411

Better support for Istio service mesh while running Spark on Kubernetes

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.2.3
    • None
    • Kubernetes

    Description

      Support for Strict MTLS

      In strict MTLS Peer Authentication Istio requires each pod to be associated with a service identity (as this allows listeners to use the correct cert and chain). Without the service identity communication goes through passthrough cluster which is not permitted in strict mode. Community is still investigating communication through IPs with strict MTLS https://github.com/istio/istio/issues/37431#issuecomment-1412831780. Today Spark backend creates a service record for driver however executor pods register with driver using their Pod IPs. In this model therefore, TLS handshake would fail between driver and executor and also between executors. As part of this Jira we want to similarly add service records for the executor pods as well. This can be achieved by adding a ExecutorServiceFeatureStep similar to existing DriverServiceFeatureStep

      Allowing binding to all IPs

      Before Istio 1.10 the istio-proxy sidecar was forwarding outside traffic to localhost of the pod. Thus if the application container is binding only to Pod IP the traffic would not be forwarded to it. This was addressed in 1.10 https://istio.io/latest/blog/2021/upcoming-networking-changes. However the old behavior is still accessible through disabling the feature flag PILOT_ENABLE_INBOUND_PASSTHROUGH. Request to remove it has had some push back https://github.com/istio/istio/issues/37642. In current implementation Spark K8s backend does not allow to pass bind address for driver https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala#L35 however as part of this Jira we want to allow passing of bind address even in Kubernetes mode so long as the bind address is 0.0.0.0. This lets user choose the behavior depending on the state of PILOT_ENABLE_INBOUND_PASSTHROUGH in her Istio cluster.

      Better support for istio-proxy sidecar lifecycle management

      In istio-enabled cluster istio-proxy sidecars would be auto-injected to driver/executor pods. If the application is ephemeral then driver and executor containers would exit, however istio-proxy container would continue to run. This causes driver/executor pods to enter NotReady state. As part of this jira we want ability to run a post stop cleanup after driver/executor container is completed. Similarly we also want to add support for a pre start up script, which can ensure for example that istio-sidecar is up before executor/driver container gets started.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              puneetguptanitj Puneet
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: