Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3421

Column Level Top K Values Statistics

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Compute (estimate) top k values statistics for each column, and put the most skewed column into skewed info, if user hasn't specified skew.

      This feature depends on ListBucketing (create table skewed on) https://cwiki.apache.org/Hive/listbucketing.html.

      All column topk can be added to skewed info, if in the future skewed info supports multiple independent columns.

      The TopK algorithm is based on this paper:
      http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf

        Attachments

        1. HIVE-3421.patch.9.txt
          265 kB
          Feng Lu
        2. HIVE-3421.patch.8.txt
          257 kB
          Feng Lu
        3. HIVE-3421.patch.7.txt
          203 kB
          Feng Lu
        4. HIVE-3421.patch.6.txt
          203 kB
          Feng Lu
        5. HIVE-3421.patch.5.txt
          203 kB
          Feng Lu
        6. HIVE-3421.patch.4.txt
          202 kB
          Feng Lu
        7. HIVE-3421.patch.3.txt
          197 kB
          Feng Lu
        8. HIVE-3421.patch.2.txt
          197 kB
          Feng Lu
        9. HIVE-3421.patch.1.txt
          189 kB
          Feng Lu
        10. HIVE-3421.patch.txt
          188 kB
          Feng Lu

          Activity

            People

            • Assignee:
              farlue Feng Lu
              Reporter:
              farlue Feng Lu
            • Votes:
              6 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated: