Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
Impala 2.5.0
-
None
Description
For fixed length char column the query issued to compute the statistics tried to calculate Max and Average column length which is already known from the column data type.
Query issued
SELECT NDV(l_orderkey) AS l_orderkey, CAST(-1 as BIGINT), MAX(length(l_orderkey)), AVG(length(l_orderkey)) FROM tpch_3000_parquet.l_orderkey_char_11
Table schema
describe tpch_3000_parquet.l_orderkey_char_11;
+------------+----------+---------+
| name | type | comment |
+------------+----------+---------+
| l_orderkey | char(11) | |
+------------+----------+---------+
Table statistics, not clean why the Avg size is not 11, I guess an overflow might be happening somewhere as row count is 18 Billion.
+------------+----------+------------------+--------+----------+-------------------+ | Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size | +------------+----------+------------------+--------+----------+-------------------+ | l_orderkey | CHAR(11) | 4525097984 | -1 | 11 | 10.38269996643066 | +------------+----------+------------------+--------+----------+-------------------+