Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22939

Datasketches support

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      We could probably integrate with the Datasketches more closely; it has very usefull alogrithms which could utilized various ways:

      • provide an optional way to transparently rewrite count(distinct) to use some distinct counting sketch
      • fequent items could be gathered during statistics collection; knowing the most frequent elements could extremely helpfull in giving more accurate estimates for our plans
      • and...it also has a way to estimate a CDF function; which might be usefull in giving better estimates for range queries

      https://datasketches.apache.org/

      Attachments

        Activity

          People

            Unassigned Unassigned
            kgyrtkirk Zoltan Haindrich
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 18h 50m
                18h 50m