Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 4.2.0
-
None
-
ghx-label-8
Description
If an argument of a GenericUDF is NULL then Impala passes a null instead of a deferred object:
https://github.com/apache/impala/blob/5abbb9bd17373c8aafe6d213d328e16934cdca07/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorGeneric.java#L74
This seems to be wrong, as the example GenericUDFs I checked in Hive assume that the argument is not null, but the DeferredObject's get() function can return null:
https://github.com/apache/hive/blob/7082fd1dfd087c99e6f00a7a0e95a30e198fede8/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIf.java#L165
This also makes sense as one of the goals of DeferredObject is lazy evaluation, so we may not know before calling get() whether the argument is null
https://github.com/apache/hive/blob/7082fd1dfd087c99e6f00a7a0e95a30e198fede8/ql/src/java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java#L92
Even Impala's test UDFs throw an exception for NULL:
create function generic_identity(int) returns int location '/test-warehouse/impala-hive-udfs.jar' symbol='org.apache.impala.TestGenericUdf'; select generic_identity(cast(NULL as int)); WARNINGS: UDF WARNING: Hive UDF path=hdfs://localhost:20500/test-warehouse/impala-hive-udfs.jar class=org.apache.impala.TestGenericUdf failed due to: NullPointerException: null