Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4002

Fetch task aggregation for simple group by query

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.12.0
    • Component/s: Query Processor
    • Labels:
      None

      Description

      Aggregation queries with no group-by clause (for example, select count from src) executes final aggregation in single reduce task. But it's too small even for single reducer because the most of UDAF generates just single row for map aggregation. If final fetch task can aggregate outputs from map tasks, shuffling time can be removed.

      This optimization transforms operator tree something like,

      TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK

      into

      TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)

      With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 min, before).

        Attachments

        1. HIVE-4002.D8739.1.patch
          83 kB
          Phabricator
        2. HIVE-4002.D8739.2.patch
          83 kB
          Phabricator
        3. HIVE-4002.D8739.3.patch
          81 kB
          Phabricator
        4. HIVE-4002.D8739.4.patch
          81 kB
          Phabricator
        5. HIVE-4002.patch
          77 kB
          Yin Huai

          Issue Links

            Activity

              People

              • Assignee:
                navis Navis
                Reporter:
                navis Navis
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: