Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23788

FilterStatsRule misestimate causes hashtable computation to rehash often

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Depending on available statistics, FilterStatsRule estimates the rows as numRows/3 at times. This causes, lower keyCount to be projected for hashtable computation causing rehashing often.

      https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L952

      https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L1192

      E.g TPCDS Q74 @ 10TB. But as part of evaluating "t_s_firstyear.year_total > 0, t_w_secyear.year_total / t_w_firstyear.year_total , t_s_secyear.year_total / t_s_firstyear.year_total " conditions, it projects 1/3rd of the rows causing rehashing of hashtable in downstream vertex.

      May have to check whether stats can be projected for these columns correctly.

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rajesh.balamohan Rajesh Balamohan
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: