Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3516

Fast incremental statistics computation on columns in Hive tables

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Statistics
    • None

    Description

      Statistics computed on Hive columns in partition can be rolled up to avoid scanning the table again to compute column statistics at the table(global) level. While its straightforward to roll up some statistics such as max, min, avgcollen, maxcollen etc, rolling up other statistics such as ndv requires maintaining intermediate state. This ticket covers the task of a) maintaining the necessary intermediate state needed to roll up partition level statistics b) detecting that the partition level statistics can be rolled up and actually computing table level statistics from partition level statistics.

      Attachments

        Issue Links

          Activity

            People

              shreepadma Shreepadma Venugopalan
              shreepadma Shreepadma Venugopalan
              Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: