Uploaded image for project: 'Apache Quickstep'
  1. Apache Quickstep
  2. QUICKSTEP-57

FinalizeAggregation Performance Improvement

    XMLWordPrintableJSON

Details

    Description

      The two step GROUP BY aggregation involves two steps:
      1. Aggregation from StorageBlocks in different hash tables. (Performed through Aggregation operator). The number of hash tables are same as number of worker threads. Each thread uses only one hash table at a time.
      2. Merging the various aggregation hash tables in one (Performed through Finalize Aggregation operator)

      The step 2 is needed because the same GROUP BY key could be present in multiple hash tables and we need to merge the payloads for the key.

      We can avoid the step 2 if the different hash tables mentioned in step 1 have no overlap in terms of their GROUP BY keys. One way to achieve this is by partitioning the aggregated tuples based on their GROUP BY keys.

      Attachments

        Issue Links

          Activity

            People

              hbdeshmukh@apache.org Harshad Deshmukh
              hbdeshmukh@apache.org Harshad Deshmukh
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: