Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-2501

Stream Aggregate GTRecords at Query Server

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • v1.6.0
    • v2.0.0
    • None

    Description

      Problem
      When query server needs to handle millions of records, CubeTupleConverter could become performance bottleneck.
      An experiment shows that converting 5 millions records takes ~11s, which accounts for 50% of the total query time.

      Motivation
      Records returned from each storage partition is guaranteed to be ordered. Therefore we could reduce the number of records passed to CubeTupleConverter by

      1. merge sorted records from all partitions, similar to what we have done in KYLIN-1787
      2. use a stream aggregate algorithm on merged stream to aggregate those records with the same key

      Proposal

      1. Add a new physical operator GTStreamAggregateScanner which implements the stream aggregate algorithm
      2. Refine SortedIteratorMergerWithLimit that was used to merge sort records from different partitions. The previous implementation has performance issues (KYLIN-2483) due to expensive record clone
      3. Leverage GTStreamAggregateScanner to aggregate records on merged stream

      Scope
      Stream aggregate has some good properties such as low memory usage and streamable ordered outputs, making it better than hash/sort based alternatives when input is already sorted. So I bet the new GTStreamAggregateScanner operator can also be used to accelerate cubing and coprocessor aggregation in certain cases. I'll focus on query server in this jira and leave those optimizations as future works.

      Attachments

        1. KYLIN-2501.patch
          68 kB
          Dayue Gao

        Activity

          People

            gaodayue Dayue Gao
            gaodayue Dayue Gao
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: