Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4002

Fetch task aggregation for simple group by query

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 0.12.0
    • Query Processor
    • None

    Description

      Aggregation queries with no group-by clause (for example, select count from src) executes final aggregation in single reduce task. But it's too small even for single reducer because the most of UDAF generates just single row for map aggregation. If final fetch task can aggregate outputs from map tasks, shuffling time can be removed.

      This optimization transforms operator tree something like,

      TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK

      into

      TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)

      With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 min, before).

      Attachments

        1. HIVE-4002.D8739.1.patch
          83 kB
          Phabricator
        2. HIVE-4002.D8739.2.patch
          83 kB
          Phabricator
        3. HIVE-4002.D8739.3.patch
          81 kB
          Phabricator
        4. HIVE-4002.D8739.4.patch
          81 kB
          Phabricator
        5. HIVE-4002.patch
          77 kB
          Yin Huai

        Issue Links

          Activity

            People

              navis Navis Ryu
              navis Navis Ryu
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: