Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16298

Add config to specify multi-column joins have correlated columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0
    • None

    Description

      The default row estimation for multi-key joins divides the row estimate by the product of the NDVs for each join column, which can cause the row estimate to be low. Try adding a config to assume the columns are correlated, where we only divide the row estimate by the largest NDV.

      Attachments

        1. HIVE-16298.1.patch
          16 kB
          Jason Dere
        2. HIVE-16298.2.patch
          16 kB
          Jason Dere

        Issue Links

          Activity

            People

              sseth Siddharth Seth
              jdere Jason Dere
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: