Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16804

Correlated subqueries containing non-deterministic operators return incorrect results

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.0.2, 2.1.0
    • SQL
    • None

    Description

      Correlated subqueries with LIMIT could return incorrect results. The rule ResolveSubquery in the Analysis phase moves correlated predicates to a join predicates and neglect the semantic of the LIMIT.

      Example:

      Seq(1, 2).toDF("c1").createOrReplaceTempView("t1")
      Seq(1, 2).toDF("c2").createOrReplaceTempView("t2")
      
      sql("select c1 from t1 where exists (select 1 from t2 where t1.c1=t2.c2 LIMIT 1)").show
      +---+                                                                           
      | c1|
      +---+
      |  1|
      +---+
      

      The correct result contains both rows from T1.

      Attachments

        Issue Links

          Activity

            People

              nsyca Nattavut Sutyanyong
              nsyca Nattavut Sutyanyong
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 72h
                  72h
                  Remaining:
                  Remaining Estimate - 72h
                  72h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified