Description
Assume this data:
create or replace temp view v1 as select * FROM VALUES (interval '10' months, interval '10' day, 2) as v1(period, duration, num); cache table v1;
These two queries work:
spark-sql> select period/num from v1; 0-5 Time taken: 0.143 seconds, Fetched 1 row(s)
spark-sql> select duration/num from v1; 5 00:00:00.000000000 Time taken: 0.094 seconds, Fetched 1 row(s)
However, these two queries get a codegen compilation error:
spark-sql> select period/(num + 3) from v1; 22/05/03 08:56:37 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 40, Column 44: Expression "project_value_2" is not an rvalue ... 22/05/03 08:56:37 WARN UnsafeProjection: Expr codegen error and falling back to interpreter mode ... 0-2 Time taken: 0.149 seconds, Fetched 1 row(s)
spark-sql> select duration/(num + 3) from v1; 22/05/03 08:57:29 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 40, Column 54: Expression "project_value_2" is not an rvalue ... 22/05/03 08:57:29 WARN UnsafeProjection: Expr codegen error and falling back to interpreter mode ... 2 00:00:00.000000000 Time taken: 0.089 seconds, Fetched 1 row(s)
Even the first two queries will get a compilation error if you turn off whole-stage codegen:
spark-sql> set spark.sql.codegen.wholeStage=false; spark.sql.codegen.wholeStage false Time taken: 0.055 seconds, Fetched 1 row(s) spark-sql> select period/num from v1; 22/05/03 09:16:42 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 37, Column 5: Expression "value_1" is not an rvalue .... 0-5 Time taken: 0.175 seconds, Fetched 1 row(s) spark-sql> select duration/num from v1; 22/05/03 09:17:41 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 37, Column 5: Expression "value_1" is not an rvalue ... 5 00:00:00.000000000 Time taken: 0.104 seconds, Fetched 1 row(s)
Note that in the error cases, the queries still return a result because Spark falls back on interpreting the divide expression (so I marked this as "minor").