Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9931

Approximate nDV statistics from ORC bloom filter population

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.2.0
    • None
    • Statistics

    Description

      The current CBO implementation requires column nDV statistics to produce good estimates of JOIN selectivity and filter selectivity.

      The ORC bloom filters provides an opportunity to estimate the net population of a row-group with false-positive rates capped for each row-group.

      This is not useful for filter conditions or join conditions with a cardinality which is a large fraction of the row-count, but can collect viable statistics for low-cardinality filter columns (de-normalization scenarios) or for JOIN dimension columns of low cardinality (demographics or store location).

      The challenge in this feature is in distinguishing between these two scenarios, not in the derivation of the approximate nDV itself.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gopalv Gopal Vijayaraghavan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: