Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.0.0, 3.5.2, 3.4.4
Description
This bug is introduce from Apache Spark 3.3.0.
OptimizeOneRowPlan aggressively rewrites operators or removes operators if the rule figures out that the operator has a chance to be optimized with the stats that there will be max 1 row in the input.
This is problematic for streaming, because we aren't seeing the whole data but a part of data in the current microbatch and optimizer is not aware of this.
There are various viable approaches to deal with, but maybe fixing the rule to disable this effectively with streaming DataFrame would have least effect.
(There is a separate wider effort to achieve better stability between QO and streaming. Since it would take a considerable time, we still need point fixes during the time.)
Attachments
Issue Links
- links to