Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9392

JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 0.14.0
    • None
    • Physical Optimizer
    • None

    Description

      In JoinStatsRule.process the join column statistics are stored in HashMap joinedColStats, the key used which is the ColStatistics.fqColName is duplicated between join column in the same vertex, as a result distinctVals ends up having duplicated values which negatively affects the join cardinality estimation.

      The duplicate keys are usually named KEY.reducesinkkey0.

      Attachments

        1. HIVE-9392.6.patch
          36 kB
          Pengcheng Xiong
        2. HIVE-9392.5.patch
          30 kB
          Pengcheng Xiong
        3. HIVE-9392.4.patch
          2 kB
          Pengcheng Xiong
        4. HIVE-9392.3.patch
          0.7 kB
          Pengcheng Xiong
        5. HIVE-9392.2.patch
          106 kB
          Prasanth Jayachandran
        6. HIVE-9392.1.patch
          24 kB
          Prasanth Jayachandran

        Issue Links

          Activity

            People

              pxiong Pengcheng Xiong
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: