Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.0.0
-
None
Description
I was tracking through method calls for DSv2 streaming source, and figured out planInputPartitions is called 4 times per microbatch.
It turned out that multiple calls of planInputPartitions is due to `DataSourceV2ScanExecBase.supportsColumnar`, though it is called through `MicroBatchScanExec.inputPartitions` which is defined as lazy, hence shouldn't happen.
The behavior seems to be coupled with catalyst and very hard to figure out why, but with SPARK-44505, we can at least fix this per each data source.