Hive
  1. Hive
  2. HIVE-7063

Optimize for the Top N within a Group use case

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.14.0
    • Component/s: None
    • Labels:
      None

      Description

      It is common to rank within a Group/Partition and then only return the Top N entries within each Group.
      With Streaming mode for Windowing, we should push the post filter on the rank into the Windowing processing as a Limit expression.

      1. HIVE-7063.1.patch
        8 kB
        Harish Butani
      2. HIVE-7063.2.patch
        46 kB
        Harish Butani
      3. HIVE-7063.3.patch
        53 kB
        Harish Butani

        Activity

        Thejas M Nair made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Thejas M Nair added a comment -

        This has been fixed in 0.14 release. Please open new jira if you see any issues.

        Show
        Thejas M Nair added a comment - This has been fixed in 0.14 release. Please open new jira if you see any issues.
        Hide
        Lefty Leverenz added a comment -

        Pinging Harish Butani: No doc needed for this optimization?

        Show
        Lefty Leverenz added a comment - Pinging Harish Butani : No doc needed for this optimization?
        Hide
        Lefty Leverenz added a comment -

        No user doc for this?

        Show
        Lefty Leverenz added a comment - No user doc for this?
        Ashutosh Chauhan made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 0.14.0 [ 12326450 ]
        Resolution Fixed [ 1 ]
        Hide
        Ashutosh Chauhan added a comment -

        Committed to trunk. Thanks, Harish!

        Show
        Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Harish!
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12653080/HIVE-7063.3.patch

        ERROR: -1 due to 2 failed/errored test(s), 5671 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
        org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/630/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/630/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-630/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 2 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12653080

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12653080/HIVE-7063.3.patch ERROR: -1 due to 2 failed/errored test(s), 5671 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/630/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/630/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-630/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed This message is automatically generated. ATTACHMENT ID: 12653080
        Hide
        Ashutosh Chauhan added a comment -

        +1

        Show
        Ashutosh Chauhan added a comment - +1
        Hide
        Harish Butani added a comment -

        thanks Ashutosh Chauhan. Have uploaded a patch addressing the issues you raised

        Show
        Harish Butani added a comment - thanks Ashutosh Chauhan . Have uploaded a patch addressing the issues you raised
        Harish Butani made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Harish Butani made changes -
        Attachment HIVE-7063.3.patch [ 12653080 ]
        Harish Butani made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Ashutosh Chauhan added a comment -

        Make sense. I left few comments on RB.

        Show
        Ashutosh Chauhan added a comment - Make sense. I left few comments on RB.
        Hide
        Harish Butani added a comment -

        Yes, in your case we can optimize as though 'rank < 5' was specified. Though I cannot see a valid use case of writing a limit after a windowing expression, as you point out the more common case is a predicate on rank.

        Show
        Harish Butani added a comment - Yes, in your case we can optimize as though 'rank < 5' was specified. Though I cannot see a valid use case of writing a limit after a windowing expression, as you point out the more common case is a predicate on rank.
        Hide
        Ashutosh Chauhan added a comment -

        This is not going to optimize limit with rank like following :

        select * from ( select p_mfgr, rank() over(..) from part) a limit 4;
        

        Rather, this optimization is targeted for rank with filter predicates. It does seem like users are likely to write query with filter predicate given semantics of rank so this may not be an issue, but I think its good to note here so expectations are clear.

        Show
        Ashutosh Chauhan added a comment - This is not going to optimize limit with rank like following : select * from ( select p_mfgr, rank() over(..) from part) a limit 4; Rather, this optimization is targeted for rank with filter predicates. It does seem like users are likely to write query with filter predicate given semantics of rank so this may not be an issue, but I think its good to note here so expectations are clear.
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12651239/HIVE-7063.2.patch

        ERROR: -1 due to 5 failed/errored test(s), 5656 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_implicit_cast1
        org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
        org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
        org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/519/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/519/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-519/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 5 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12651239

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12651239/HIVE-7063.2.patch ERROR: -1 due to 5 failed/errored test(s), 5656 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_implicit_cast1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/519/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/519/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-519/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed This message is automatically generated. ATTACHMENT ID: 12651239
        Harish Butani made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Harish Butani made changes -
        Attachment HIVE-7063.2.patch [ 12651239 ]
        Harish Butani made changes -
        Field Original Value New Value
        Attachment HIVE-7063.1.patch [ 12648684 ]
        Hide
        Harish Butani added a comment -

        preliminary patch: this adds code to WdwTabFn to react to a rank limit.

        Show
        Harish Butani added a comment - preliminary patch: this adds code to WdwTabFn to react to a rank limit.
        Hide
        Gopal V added a comment -

        This would be exceptionally useful - I have seen at least two implementations of TOPN UDAFs for this.

        Show
        Gopal V added a comment - This would be exceptionally useful - I have seen at least two implementations of TOPN UDAFs for this.
        Harish Butani created issue -

          People

          • Assignee:
            Harish Butani
            Reporter:
            Harish Butani
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development