Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-49524 Improve K8s support
  3. SPARK-49804

Fix to use the exit code of executor container always

    XMLWordPrintableJSON

Details

    Description

      When deploying Spark pods on Kubernetes with sidecars, the reported executor's exit code may be incorrect.

      For example, the reported executor's exit code is 0, but the actual is 52 (OOM).

      2024-09-25 02:35:29,383 ERROR TaskSchedulerImpl: org.apache.spark.scheduler.TaskSchedulerImpl.logExecutorLoss(TaskSchedulerImpl.scala:972) - Lost executor 1 on XXXXX: The executor with id 1 exited with exit code 0(success).
        
      The API gave the following container statuses:
       
           container name: fluentd
           container image: docker-images-release.XXXXX.com/XXXXX/fluentd:XXXXX
           container state: terminated
           container started at: 2024-09-25T02:32:17Z
           container finished at: 2024-09-25T02:34:52Z
           exit code: 0
           termination reason: Completed
       
           container name: istio-proxy
           container image: docker-images-release.XXXXX.com/XXXXX-istio/proxyv2:XXXXX
           container state: running
           container started at: 2024-09-25T02:32:16Z
       
           container name: spark-kubernetes-executor
           container image: docker-dev-artifactory.XXXXX.com/XXXXX/spark-XXXXX:XXXXX
           container state: terminated
           container started at: 2024-09-25T02:32:17Z
           container finished at: 2024-09-25T02:35:28Z
           exit code: 52
           termination reason: Error 

      Attachments

        Issue Links

          Activity

            People

              fe2s Oleksiy Dyagilev
              fe2s Oleksiy Dyagilev
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: