Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11111

Fast null-safe join

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.6.0
    • SQL
    • None

    Description

      Today, null safe joins are executed with a Cartesian product.

      scala> sqlContext.sql("select * from t a join t b on (a.i <=> b.i)").explain
      == Physical Plan ==
      TungstenProject [i#2,j#3,i#7,j#8]
       Filter (i#2 <=> i#7)
        CartesianProduct
         LocalTableScan [i#2,j#3], [[1,1]]
         LocalTableScan [i#7,j#8], [[1,1]]
      

      One option is to add this rewrite to the optimizer:

      select * 
      from t a 
      join t b 
        on coalesce(a.i, <default>) = coalesce(b.i, <default>) AND (a.i <=> b.i)
      

      Acceptance criteria: joins with only null safe equality should not result in a Cartesian product.

      Attachments

        Issue Links

          Activity

            People

              davies Davies Liu
              davies Davies Liu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: