Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22960

Approximate TopN Key Operator

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • None
    • 4.0.0-alpha-2
    • Hive
    • None

    Description

      "Different from other operators, top n operator demonstrates the notable “long tail” characteristics which makes it distinct from other operators like join, group by and etc. will saturate very quickly. Update is pretty frequent at the beginning and then diverges to a very slow update frequently.

      The approximation can be implemented in two ways: one way is to stop the array/heap update after certain percentage of the data is been read, for example, 10% or 20%, if we know the table size. The other way is to set a frequency threshold of the array/heap update. After the threshold is met, then stop the top n processing"

      rzhappy

      Y: number of updates in every 100msec

      Attachments

        Activity

          People

            amagyar Attila Magyar
            amagyar Attila Magyar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: