Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15796

HoS: poor reducer parallelism when operator stats are not accurate

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.3.0
    • Statistics

    Description

      In HoS we use currently use operator stats to determine reducer parallelism. However, it is often the case that operator stats are not accurate, especially if column stats are not available. This sometimes will generate extremely poor reducer parallelism, and cause HoS query to run forever.

      This JIRA tries to offer an alternative way to compute reducer parallelism, similar to how MR does. Here's the approach we are suggesting:
      1. when computing the parallelism for a MapWork, use stats associated with the TableScan operator;
      2. when computing the parallelism for a ReduceWork, use the maximum parallelism from all its parents.

      Attachments

        1. HIVE-15796.wip.patch
          9 kB
          Chao Sun
        2. HIVE-15796.wip.1.patch
          9 kB
          Chao Sun
        3. HIVE-15796.wip.2.patch
          9 kB
          Chao Sun
        4. HIVE-15796.1.patch
          50 kB
          Chao Sun
        5. HIVE-15796.2.patch
          52 kB
          Chao Sun
        6. HIVE-15796.3.patch
          11 kB
          Chao Sun
        7. HIVE-15796.4.patch
          29 kB
          Chao Sun
        8. HIVE-15796.5.patch
          30 kB
          Chao Sun
        9. HIVE-15796.6.patch
          30 kB
          Chao Sun

        Issue Links

          Activity

            People

              csun Chao Sun
              csun Chao Sun
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: