Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35428

Spark history Server to S3 doesn't show incomplete applications

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.5
    • None
    • Structured Streaming
    • None
    • Jupyter Notebook sparkmagic with Spark(2.4.5)  client mode running on Kubernetes

    • Important

    Description

      Jupyter Notebook sparkmagic with Spark(2.4.5)  client mode running on Kubernetes.  I am redirecting the spark event logs to a S3 with the following configuration:
       
      spark.eventLog.enabled = true
      spark.history.ui.port = 18080
      spark.eventLog.dir = s3://livy-spark-log/spark-history/
      spark.history.fs.logDirectory = s3://livy-spark-log/spark-history/
      spark.history.fs.update.interval = 5s

      spark.eventLog.buffer.kb = 1k

       

      spark.streaming.driver.writeAheadLog.closeFileAfterWrite = true
      spark.streaming.receiver.writeAheadLog.closeFileAfterWrite = true

       
       
      Once my application is completed, I can see it shows up on the spark history server. However, running applications doesn't show up on "incomplete applications". I have also checked the log, whenever my application end, I can see this message:
       
      21/05/17 06:14:18 INFO k8s.KubernetesClusterSchedulerBackend: Shutting down all executors
      21/05/17 06:14:18 INFO k8s.KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down
      21/05/17 06:14:18 WARN k8s.ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
      21/05/17 06:14:18 INFO s3n.MultipartUploadOutputStream: close closed:false s3://livy-spark-log/spark-history/spark-48c3141875fe4c67b5708400134ea3d6.inprogress
      21/05/17 06:14:19 INFO s3n.S3NativeFileSystem: rename s3://livy-spark-log/spark-history/spark-48c3141875fe4c67b5708400134ea3d6.inprogress s3://livy-spark-log/spark-history/spark-48c3141875fe4c67b5708400134ea3d6
      21/05/17 06:14:19 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
      21/05/17 06:14:19 INFO memory.MemoryStore: MemoryStore cleared
      21/05/17 06:14:19 INFO storage.BlockManager: BlockManager stopped
       
       
      I am not able to see any xx.inprogress file on S3 though. Anyone had this problem before? Otherwise, I would take it as a bug.

      Attachments

        1. image-2022-08-03-12-03-39-533.png
          24 kB
          Rostislav Nedelchev

        Activity

          People

            Unassigned Unassigned
            jtianbin Tianbin Jiang
            Apache Spark Apache Spark
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: