Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-43113

Codegen error when full outer join's bound condition has multiple references to the same stream-side column

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.2, 3.4.0, 3.5.0
    • 3.3.3, 3.4.1, 3.5.0
    • SQL
    • None

    Description

      Example # 1 (sort merge join):

      create or replace temp view v1 as
      select * from values
      (1, 1),
      (2, 2),
      (3, 1)
      as v1(key, value);
      
      create or replace temp view v2 as
      select * from values
      (1, 22, 22),
      (3, -1, -1),
      (7, null, null)
      as v2(a, b, c);
      
      select *
      from v1
      full outer join v2
      on key = a
      and value > b
      and value > c;
      

      The join's generated code causes the following compilation error:

      org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 277, Column 9: Redefinition of local variable "smj_isNull_7"
      

      Example #2 (shuffle hash join):

      select /*+ SHUFFLE_HASH(v2) */ *
      from v1
      full outer join v2
      on key = a
      and value > b
      and value > c;
      

      The shuffle hash join's generated code causes the following compilation error:

      org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 174, Column 5: Redefinition of local variable "shj_value_1" 
      

      With default configuration, both queries end up succeeding, since Spark falls back to running each query with whole-stage codegen disabled.

      The issue happens only when the join's bound condition refers to the same stream-side column more than once.

      Attachments

        Activity

          People

            bersprockets Bruce Robbins
            bersprockets Bruce Robbins
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: