Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.0.0
-
None
Description
Current Hive re-estimates column stats (including avgColLen) when it encounters UDF.
In the case of upper and lower, Hive sets avgColLen to hive.stats.max.variable.length.
But these UDFs do not change column stats and the default value(100) is too high for string type key columns, on which upper/lower are usually applied.
This patch keeps input data's avgColLen after applying UDF upper/lower to make a better query plan.