[SPARK-43113] Codegen error when full outer join's bound condition has multiple references to the same stream-side column - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.3.2, 3.4.0, 3.5.0
Fix Version/s: 3.3.3, 3.4.1, 3.5.0
Component/s: SQL
Labels:
None

Description

Example # 1 (sort merge join):

create or replace temp view v1 as
select * from values
(1, 1),
(2, 2),
(3, 1)
as v1(key, value);

create or replace temp view v2 as
select * from values
(1, 22, 22),
(3, -1, -1),
(7, null, null)
as v2(a, b, c);

select *
from v1
full outer join v2
on key = a
and value > b
and value > c;

The join's generated code causes the following compilation error:

org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 277, Column 9: Redefinition of local variable "smj_isNull_7"

Example #2 (shuffle hash join):

select /*+ SHUFFLE_HASH(v2) */ *
from v1
full outer join v2
on key = a
and value > b
and value > c;

The shuffle hash join's generated code causes the following compilation error:

org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 174, Column 5: Redefinition of local variable "shj_value_1"

With default configuration, both queries end up succeeding, since Spark falls back to running each query with whole-stage codegen disabled.

The issue happens only when the join's bound condition refers to the same stream-side column more than once.

Attachments

Issue Links

links to

[Github] Pull Request #40881 (bersprockets)

Activity

People

Assignee:: Bruce Robbins

Reporter:: Bruce Robbins

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 12/Apr/23 22:33

Updated:: 24/Apr/23 00:59

Resolved:: 18/Apr/23 04:11