Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.4.6
-
None
-
None
Description
Query Id and Batch Id information is not available for jobs started by structured streaming query when foreachBatch API is used in PySpark.
This happens only with foreachBatch in pyspark. ForeachBatch in scala works fine, and also other structured streaming sinks in pyspark work fine. I am attaching a screenshot of jobs pages.
I think job group is not set properly when foreachBatch is used via pyspark. I have a framework that depends on the queryId and batchId information available in the job properties and so my framework doesn't work for pyspark-foreachBatch use case.