Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
This is similar to HIVE-15588. With a customer query, I reproduced a vectorized expression tree like the below one (I'll attach a simple repro query when it's possible):
selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col 61:string)(children: StringColumnInList(col 13, values TermDeposit, RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns [61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST( _col1 AS DATE)), 'MM-dd-yyyy'))(children: VectorUDFUnixTimeStampDate(col 68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) -> 61:string, ConstantVectorExpression(val ) -> 62:string) -> 63:string, ConstantVectorExpression(val ) -> 61:string) -> 62:string
query part was:
CASE WHEN DLY_BAL.PDELP_VALUE in ( 'TermDeposit', 'RecurringDeposit', 'CertificateOfDeposit' ) THEN NVL( ( from_unixtime( unix_timestamp( cast(DLY_BAL.APATD_MTRTY_DATE as date) ), 'MM-dd-yyyy' ) ), ' ' ) ELSE '' END AS MAT_DTE
Here is the problem described:
1. IfExprCondExprColumn has 62:string as its outputColumn, which is a reused scratch column (see 5) )
2. in evaluation time, isRepeating is reset
3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of children is required, so we go to conditionalEvaluate
4. one of the children is ConstantVectorExpression(val ) -> 62:string, which belongs to the second branch of VectorCoalesce, so to the '' empty string in NVL's second argument
5. in 4) 62: string column is set to an isRepeating column (and it's released by freeNonColumns), so it's marked as a reusable scratch column
6. after the conditional evaluation in 3), the final output of IfExprCondExprColumn set here, but here we get an exception here:
2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource: java.lang.AssertionError: Output column number expected to be 0 when isRepeating at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494) at org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyStringOperator.processBatch(VectorMapJoinInnerBigOnlyStringOperator.java:371) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.process(VectorMapJoinCommonOperator.java:839) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
this is clearly an incorrect scratch column reuse, where we reused the output of some children, and got that vector in an inconsistent state
this must not be fixed by resetting vectors in more places in IfExprCondExprColumn, as it would just hide the original issue
I realized that the problem can be easily fixed by simply preventing releasing ConstantVectorExpressions, that's what I'm trying to test now
Attachments
Issue Links
- fixes
-
HIVE-20990 ORC case when/if with coalesce wrong results or case: java.lang.AssertionError: Output column number expected to be 0 when isRepeating
- Open
- is related to
-
HIVE-15588 Vectorization: Fix deallocation of scratch columns in VectorUDFCoalesce, etc to prevent wrong reuse
- Closed
- links to