Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-17896

TopNKey: Create a standalone vectorizable TopNKey operator

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.2.0, 4.0.0-alpha-1
    • Operators
    • None

    Description

      For TPC-DS Query27, the TopN operation is delayed by the group-by - the group-by operator buffers up all the rows before discarding the 99% of the rows in the TopN Hash within the ReduceSink Operator.

      The RS TopN operator is very restrictive as it only supports doing the filtering on the shuffle keys, but it is better to do this before breaking the vectors into rows and losing the isRepeating properties.

      Adding a TopN Key operator in the physical operator tree allows the following to happen.

      GBY->RS(Top=1)

      can become

      TNK(1)>GBY>RS(Top=1)

      So that, the TopNKey can remove rows before they are buffered into the GBY and consume memory.

      Here's the equivalent implementation in Presto

      https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35

      Adding this as a sub-feature of GroupBy prevents further optimizations if the GBY is on keys "a,b,c" and the TopNKey is on just "a".

      Attachments

        1. HIVE-17896.1.patch
          84 kB
          Teddy Choi
        2. HIVE-17896.10.patch
          311 kB
          Teddy Choi
        3. HIVE-17896.11.patch
          138 kB
          Teddy Choi
        4. HIVE-17896.12.patch
          137 kB
          Teddy Choi
        5. HIVE-17896.13.patch
          1.38 MB
          Teddy Choi
        6. HIVE-17896.3.patch
          143 kB
          Teddy Choi
        7. HIVE-17896.4.patch
          149 kB
          Teddy Choi
        8. HIVE-17896.5.patch
          150 kB
          Teddy Choi
        9. HIVE-17896.6.patch
          329 kB
          Teddy Choi
        10. HIVE-17896.7.patch
          329 kB
          Teddy Choi
        11. HIVE-17896.8.patch
          329 kB
          Teddy Choi
        12. HIVE-17896.9.patch
          330 kB
          Teddy Choi

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            teddy.choi Teddy Choi Assign to me
            gopalv Gopal Vijayaraghavan
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment