Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23087

CheckCartesianProduct too restrictive when condition is constant folded to false/null

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.2.1, 2.3.0
    • 2.3.0
    • SQL
    • None

    Description

      Running

      sql("SELECT id as a FROM RANGE(10)").createOrReplaceTempView("A")
      sql("SELECT NULL as a FROM RANGE(10)").createOrReplaceTempView("NULLTAB")
      sql("SELECT 1 as goo FROM A LEFT OUTER JOIN NULLTAB ON A.a = NULLTAB.a").collect()
      

      results in:

      org.apache.spark.sql.AnalysisException: Detected cartesian product for LEFT OUTER join between logical plans
      Project
      +- Range (0, 10, step=1, splits=None)
      and
      Project
      +- Range (0, 10, step=1, splits=None)
      Join condition is missing or trivial.
      Use the CROSS JOIN syntax to allow cartesian products between these relations.;
        at 
       org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1121)
      

      This is because NULLTAB.a is constant folded to null, and then the condition is constant folded altogether:

      === Applying Rule org.apache.spark.sql.catalyst.optimizer.NullPropagation ===
      GlobalLimit 21                                      
       +- LocalLimit 21                                    
          +- Project [1 AS goo#28]                         
      !      +- Join LeftOuter, (a#0L = null)              
                :- Project [id#1L AS a#0L]                 
                :  +- Range (0, 10, step=1, splits=None)   
                +- Project                                  
                   +- Range (0, 10, step=1, splits=None) 
      
      GlobalLimit 21
      +- LocalLimit 21
         +- Project [1 AS goo#28]
            +- Join LeftOuter, null
               :- Project [id#1L AS a#0L]
               :  +- Range (0, 10, step=1, splits=None)
               +- Project
                  +- Range (0, 10, step=1, splits=None)
      

      And then CheckCartesianProduct doesn't like it, even though the condition does not produce a cartesian product, but evaluates to null.

      Attachments

        Activity

          People

            mgaido Marco Gaido
            juliuszsompolski Juliusz Sompolski
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: