Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23087

CheckCartesianProduct too restrictive when condition is constant folded to false/null

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2.1, 2.3.0
    • Fix Version/s: 2.3.0
    • Component/s: SQL
    • Labels:
      None

      Description

      Running

      sql("SELECT id as a FROM RANGE(10)").createOrReplaceTempView("A")
      sql("SELECT NULL as a FROM RANGE(10)").createOrReplaceTempView("NULLTAB")
      sql("SELECT 1 as goo FROM A LEFT OUTER JOIN NULLTAB ON A.a = NULLTAB.a").collect()
      

      results in:

      org.apache.spark.sql.AnalysisException: Detected cartesian product for LEFT OUTER join between logical plans
      Project
      +- Range (0, 10, step=1, splits=None)
      and
      Project
      +- Range (0, 10, step=1, splits=None)
      Join condition is missing or trivial.
      Use the CROSS JOIN syntax to allow cartesian products between these relations.;
        at 
       org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1121)
      

      This is because NULLTAB.a is constant folded to null, and then the condition is constant folded altogether:

      === Applying Rule org.apache.spark.sql.catalyst.optimizer.NullPropagation ===
      GlobalLimit 21                                      
       +- LocalLimit 21                                    
          +- Project [1 AS goo#28]                         
      !      +- Join LeftOuter, (a#0L = null)              
                :- Project [id#1L AS a#0L]                 
                :  +- Range (0, 10, step=1, splits=None)   
                +- Project                                  
                   +- Range (0, 10, step=1, splits=None) 
      
      GlobalLimit 21
      +- LocalLimit 21
         +- Project [1 AS goo#28]
            +- Join LeftOuter, null
               :- Project [id#1L AS a#0L]
               :  +- Range (0, 10, step=1, splits=None)
               +- Project
                  +- Range (0, 10, step=1, splits=None)
      

      And then CheckCartesianProduct doesn't like it, even though the condition does not produce a cartesian product, but evaluates to null.

        Attachments

          Activity

            People

            • Assignee:
              mgaido Marco Gaido
              Reporter:
              juliuszsompolski Juliusz Sompolski
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: