Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Compute (estimate) top k values statistics for each column, and put the most skewed column into skewed info, if user hasn't specified skew.
This feature depends on ListBucketing (create table skewed on) https://cwiki.apache.org/Hive/listbucketing.html.
All column topk can be added to skewed info, if in the future skewed info supports multiple independent columns.
The TopK algorithm is based on this paper:
http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf