Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32500

Query and Batch Id not set for Structured Streaming Jobs in case of ForeachBatch in PySpark

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.6
    • None
    • None

    Description

      Query Id and Batch Id information is not available for jobs started by structured streaming query when foreachBatch API is used in PySpark.

      This happens only with foreachBatch in pyspark. ForeachBatch in scala works fine, and also other structured streaming sinks in pyspark work fine. I am attaching a screenshot of jobs pages.

      I think job group is not set properly when foreachBatch is used via pyspark. I have a framework that depends on the queryId and batchId information available in the job properties and so my framework doesn't work for pyspark-foreachBatch use case.

       

      Attachments

        1. Screen Shot 2020-07-30 at 9.04.21 PM.png
          153 kB
          Abhishek Dixit
        2. image-2020-08-01-10-21-51-246.png
          56 kB
          JinxinTang
        3. Screen Shot 2020-07-26 at 6.50.39 PM.png
          185 kB
          Abhishek Dixit

        Activity

          People

            Unassigned Unassigned
            abhishekd0907 Abhishek Dixit
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: