Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21398

Columns which has estimated statistics should not be considered as unique keys

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.0.0-alpha-1
    • None
    • None

    Description

      Right now for a column to qualify as a unique column it has to meet the criteria:

      NDV >= numRows
      

      when numRows is 1 this tends to be true ; but numRows is also 1 in cases when we are kinda operate in the blind - don't know how many row there are - more generatlly: with estimated column statistics.

      As a sideeffect of qualifying all columns to be unique; after a few joins all column combinations became unique....so for a join between 3 tables which have (i,j,k) columns; then it will allocate i*j*k triplets of "unique column triplets".

      Attachments

        1. HIVE-21398.01.patch
          2 kB
          Zoltan Haindrich
        2. HIVE-21398.02.patch
          35 kB
          Zoltan Haindrich

        Activity

          People

            kgyrtkirk Zoltan Haindrich
            kgyrtkirk Zoltan Haindrich
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: