Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10740

handle nondeterministic expressions correctly for set operations

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5.1, 1.6.0
    • Component/s: SQL
    • Labels:
      None

      Description

      We should only push down deterministic filter condition to set operator.
      For Union, let's say we do a non-deterministic filter on 1...5 union 1...5, and we may get 1,3 for the left side and 2,4 for the right side, then the result should be 1,3,2,4. If we push down this filter, we get 1,3 for both side(we create a new random object with same seed in each side) and the result would be 1,3,1,3.
      For Intersect, let's say there is a non-deterministic condition with a 0.5 possibility to accept a row and we have a row that presents in both sides of an Intersect. Once we push down this condition, the possibility to accept this row will be 0.25.
      For Except, let's say there is a row that presents in both sides of an Except. This row should not be in the final output. However, if we pushdown a non-deterministic condition, it is possible that this row is rejected from one side and then we output a row that should not be a part of the result.

      We should only push down deterministic projection to Union.

        Attachments

          Activity

            People

            • Assignee:
              cloud_fan Wenchen Fan
              Reporter:
              cloud_fan Wenchen Fan
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: