Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
rownum estimation is based on the following fact as of now:
- datasize being used from the following sources:
- basicstats aggregates the loaded "on-heap" row sizes ; other readers are able to give "raw size" estimation - I've checked orc; but I'm sure others will do the same....api docs are a bit vague about the methods purpose...
- if the basicstats level info is not available; the filesystem level "file-size-sums" are used as the "raw data size" ; which is multiplied by the deserialization ratio ; which is currently 1.
the problem with all of this is that deser factor is 1; and that rowsize counts in the online object headers..
example; 20 rows are loaded into a partition columnstats_partlvl_dp.q
after HIVE-18108 this explain will estimate the rowsize of the table to be 404 bytes; however the 20 rows of text is only 169 bytes...so it ends up with 0 rows...
Attachments
Attachments
Issue Links
- blocks
-
HIVE-18108 in case basic stats are missing; rowcount estimation depends on the selected columns size
- Resolved
- is blocked by
-
HIVE-18163 Stats: create materialized view should also collect stats
- Closed