Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.0.0
Description
PruneFilters replaces the null / false filter with an empty relation, which means the subtree of the filter is also lost. The optimization does not care about whichever operator is in the subtree, hence some important operators like stateful operator, watermark node, observe node could be lost.
The filter could be evaluated to null / false selectively among microbatches in various reasons (one simple example is the modification of the query during restart), which means stateful operator might not be available for batch N and be available for batch N + 1. For this case, streaming query will fail as batch N + 1 cannot load the state from batch N, and it's not recoverable in most cases.
We have to disable the rule for streaming workloads, with the consideration of backward compatibility - we should avoid breaking existing query.