Details
-
Sub-task
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
In Parquet statistics, a boolean value hasNonNullValue is used for each column chunk. Hive could use this value to skip a column, avoid null-checking logic, and speed up vectorization like HIVE-4478 (in the future, Parquet vectorization is not completed yet).
In this Jira we could check whether this null optimization works, and make changes if any.