Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
ghx-label-7
Description
There are various improvements that we can make to estimate row count stats even if stats are not available for a table.
There are various factors to consider here:
- Handling for partitioned vs. non-partitioned tables
- Handling for partitioned tables can be a bit tricky if the table is in a mixed state - some partitions have row counts while other don't
- Interoperability with other systems such as Hive and Spark
- Users can run alter table statements to manually set the value of the row count
- Handling of corrupt stats vs. missing stats
- Corrupt stats are defined as stats value less than -1, or values of 0 when the underlying table has nonempty files
- Missing stats are stats that have just not been computed, and are marked as such with the value -1
The JIRA will be used to track the various improvements via sub-tasks.
Attachments
1.
|
Display the number of estimated rows for a table | Open | Unassigned | |
2.
|
Table level stats are not honored when partition has corrupt stats | Open | Unassigned |