Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35553 Improve correlated subqueries
  3. SPARK-38155

Disallow distinct aggregate in lateral subqueries with unsupported correlated predicates

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0
    • SQL
    • None

    Description

      Block lateral subqueries in CheckAnalysis that contain DISTINCT aggregate and correlated non-equality predicates. This can lead to incorrect results as DISTINCT will be rewritten as Aggregate during the optimization phase.

      For example

      CREATE VIEW t1(c1, c2) AS VALUES (0, 1)
      CREATE VIEW t2(c1, c2) AS VALUES (1, 2), (2, 2)
      SELECT * FROM t1 JOIN LATERAL (SELECT DISTINCT c2 FROM t2 WHERE c1 > t1.c1)
      

      The correct results should be (0, 1, 2) but currently, it gives  be[(0, 1, 2), (0, 1, 2)].

      Attachments

        Activity

          People

            allisonwang-db Allison Wang
            allisonwang-db Allison Wang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: