Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Statistics computed on Hive columns in partition can be rolled up to avoid scanning the table again to compute column statistics at the table(global) level. While its straightforward to roll up some statistics such as max, min, avgcollen, maxcollen etc, rolling up other statistics such as ndv requires maintaining intermediate state. This ticket covers the task of a) maintaining the necessary intermediate state needed to roll up partition level statistics b) detecting that the partition level statistics can be rolled up and actually computing table level statistics from partition level statistics.