Details
Description
This returns the wrong answer:
set spark.sql.codegen.wholeStage=false; set spark.sql.codegen.factoryMode=NO_CODEGEN; select first(col1), last(col2) from values (make_interval(0, 0, 0, 7, 0, 0, 0), make_interval(17, 0, 0, 2, 0, 0, 0)) as data(col1, col2); +---------------+---------------+ |first(col1) |last(col2) | +---------------+---------------+ |16 years 2 days|16 years 2 days| +---------------+---------------+
In the above case, TungstenAggregationIterator uses InterpretedUnsafeProjection to create the aggregation buffer and then initializes all the fields to null. InterpretedUnsafeProjection incorrectly calls UnsafeRowWriter#setNullAt, rather than unsafeRowWriter#write, for the two calendar interval fields. As a result, the writer never allocates memory from the variable length region for the two decimals, and the pointers in the fixed region get left as zero. Later, when InterpretedMutableProjection attempts to update the first field, UnsafeRow#setInterval picks up the zero pointer and stores interval data on top of the null-tracking bit set. The call to UnsafeRow#setInterval for the second field also stomps the null-tracking bit set. Later updates to the null-tracking bit set (e.g., calls to setNotNullAt) further corrupt the interval data, turning interval 7 years 2 days into interval 16 years 2 days.
Even if you fix the above bug to InterpretedUnsafeProjection so that the buffer is created correctly, InterpretedMutableProjection has a similar bug to SPARK-41395, except this time for calendar interval data:
set spark.sql.codegen.wholeStage=false; set spark.sql.codegen.factoryMode=NO_CODEGEN; select first(col1), last(col2), max(col3) from values (null, null, 1), (make_interval(0, 0, 0, 7, 0, 0, 0), make_interval(17, 0, 0, 2, 0, 0, 0), 3) as data(col1, col2, col3); +---------------+---------------+---------+ |first(col1) |last(col2) |max(col3)| +---------------+---------------+---------+ |16 years 2 days|16 years 2 days|3 | +---------------+---------------+---------+
These two bugs could get exercised during codegen fallback. Take for example this case where I forced codegen to fail for the Greatest expression:
spark-sql> select first(col1), last(col2), max(col3) from values (null, null, 1), (make_interval(0, 0, 0, 7, 0, 0, 0), make_interval(17, 0, 0, 2, 0, 0, 0), 3) as data(col1, col2, col3); 22/12/15 13:06:23 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 70, Column 1: ';' expected instead of 'if' ... 22/12/15 13:06:24 WARN MutableProjection: Expr codegen error and falling back to interpreter mode java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 78, Column 1: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 78, Column 1: ';' expected instead of 'boolean' ... 16 years 2 days 16 years 2 days 3 Time taken: 5.852 seconds, Fetched 1 row(s) spark-sql>