Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3421

Column Level Top K Values Statistics

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Compute (estimate) top k values statistics for each column, and put the most skewed column into skewed info, if user hasn't specified skew.

      This feature depends on ListBucketing (create table skewed on) https://cwiki.apache.org/Hive/listbucketing.html.

      All column topk can be added to skewed info, if in the future skewed info supports multiple independent columns.

      The TopK algorithm is based on this paper:
      http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf

      Attachments

        1. HIVE-3421.patch.1.txt
          189 kB
          Feng Lu
        2. HIVE-3421.patch.2.txt
          197 kB
          Feng Lu
        3. HIVE-3421.patch.3.txt
          197 kB
          Feng Lu
        4. HIVE-3421.patch.4.txt
          202 kB
          Feng Lu
        5. HIVE-3421.patch.5.txt
          203 kB
          Feng Lu
        6. HIVE-3421.patch.6.txt
          203 kB
          Feng Lu
        7. HIVE-3421.patch.7.txt
          203 kB
          Feng Lu
        8. HIVE-3421.patch.8.txt
          257 kB
          Feng Lu
        9. HIVE-3421.patch.9.txt
          265 kB
          Feng Lu
        10. HIVE-3421.patch.txt
          188 kB
          Feng Lu

        Activity

          People

            farlue Feng Lu
            farlue Feng Lu
            Votes:
            5 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated: