Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.10.0, Impala 2.11.0
-
None
-
ghx-label-6
Description
We should introduce a minimum sample size in bytes for COMPUTE STATS TABLESAMPLE. Reasons:
- For small tables sampling does not make sense. Accurate stats can be obtained cheaply without sampling.
- Very small sample sizes mostly do not make sense - some minimum of data is required to get meaningful stats.
I think a 1GB minimum might be a good choice and ideally this minimum sample size would be configurable.
Many other DBMS have stats collection with sampling and in many cases a minimum sample size is required to get any meaningful stats.