Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3421

Column Level Top K Values Statistics

Log workAgile BoardRank to TopRank to BottomAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Compute (estimate) top k values statistics for each column, and put the most skewed column into skewed info, if user hasn't specified skew.

      This feature depends on ListBucketing (create table skewed on) https://cwiki.apache.org/Hive/listbucketing.html.

      All column topk can be added to skewed info, if in the future skewed info supports multiple independent columns.

      The TopK algorithm is based on this paper:
      http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf

      Attachments

        1. HIVE-3421.patch.9.txt
          265 kB
          Feng Lu
        2. HIVE-3421.patch.8.txt
          257 kB
          Feng Lu
        3. HIVE-3421.patch.7.txt
          203 kB
          Feng Lu
        4. HIVE-3421.patch.6.txt
          203 kB
          Feng Lu
        5. HIVE-3421.patch.5.txt
          203 kB
          Feng Lu
        6. HIVE-3421.patch.4.txt
          202 kB
          Feng Lu
        7. HIVE-3421.patch.3.txt
          197 kB
          Feng Lu
        8. HIVE-3421.patch.2.txt
          197 kB
          Feng Lu
        9. HIVE-3421.patch.1.txt
          189 kB
          Feng Lu
        10. HIVE-3421.patch.txt
          188 kB
          Feng Lu

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            farlue Feng Lu Assign to me
            farlue Feng Lu

            Dates

              Created:
              Updated:

              Issue deployment