Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
Impala 3.2.0
-
ghx-label-6
Description
STRING columns without avg_size statistic are calculated into the row-size as 11 bytes, while they take 12 bytes in the tuple (+ more somewhere in the memory if they are not empty). The issue is caused by adding -1 (meaning unknown) to the 12 byte slot size.
I think that this doesn't cause problems, as the estimation is probably way off without statistics anyway, but row-size >= tuple size seems like a meaningful invariant that we shouldn't break.
Reproduce:
create table test_row_size (s string); explain select * from test_row_size; Result: ... WARNING: The following tables are missing relevant table and/or column statistics. default.test_row_size ... 00:SCAN HDFS [default.test_row_size] partitions=1/1 files=0 size=0B row-size=11B cardinality=0
Attachments
Issue Links
- breaks
-
IMPALA-8849 IllegalStateException in HashJoinNode because of missing memory estimate
- Resolved
- is related to
-
IMPALA-8849 IllegalStateException in HashJoinNode because of missing memory estimate
- Resolved