Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
In UPDATE STATISTICS, it may sometimes be desirable to manually create a sample table and ask UPDATE STATISTICS to use that when generating statistics for a given base table. This desire might arise from the need to work around some bug in UPDATE STATISTICS in its own automatic sample table logic. Or the desire may come from the need to manually manipulate the sample data itself.
There are two CQDs presently for this purpose.
CQD USTAT_SAMPLE_TABLE_NAME if set provides a table name which will be assumed to be a user-created sample table.
CQD USTAT_USE_BACKING_SAMPLE if set to 'ON' indicates that the sample table is a Hive table.
We could simplify this by getting rid of the second CQD, and relying on catalog and schema qualifiers in the first CQD to indicate whether the table is Hive or not using the usual rules (that is, if the catalog name is 'HIVE', then it is a Hive table).
There is other logic associated with these CQDs that tries to infer sampling ratio. This logic is inconsistent and appears incorrect. So, this aspect should be re-engineered as needed and then appropriately documented.