Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.5.0
Description
Spark can produce incorrect results when using a checkpointed DataFrame with a filter containing a scalar subquery. This subquery is included in the constraints of the resulting LogicalRDD, and may then be propagated as a filter when joining with the checkpointed DataFrame. This causes the subquery to be evaluated twice: once during checkpointing and once while evaluating the query. These two subquery evaluations may return different results, e.g. when the subquery contains a limit with an underspecified sort order.
Attachments
Issue Links
- links to