Hive
  1. Hive
  2. HIVE-2068

Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

      1. HIVE-2068.6.patch
        79 kB
        Siying Dong
      2. HIVE-2068.5.patch
        79 kB
        Siying Dong
      3. HIVE-2068.4.patch
        80 kB
        Siying Dong
      4. HIVE-2068.3.patch
        78 kB
        Siying Dong
      5. HIVE-2068.2.patch
        77 kB
        Siying Dong
      6. HIVE-2068.1.patch
        72 kB
        Siying Dong

        Activity

        Hide
        Namit Jain added a comment -

        Committed. Thanks Siying

        Show
        Namit Jain added a comment - Committed. Thanks Siying
        Hide
        Siying Dong added a comment -

        looks like simple "... limit ..." depends on the sequence of list files, which is not deterministic. I modify the test case to always put the 3 same files so that the results will be deterministic.

        Show
        Siying Dong added a comment - looks like simple "... limit ..." depends on the sequence of list files, which is not deterministic. I modify the test case to always put the 3 same files so that the results will be deterministic.
        Hide
        Namit Jain added a comment -

        Can you rerun the tests ?
        I am getting some failures - in global_limit.q

        Show
        Namit Jain added a comment - Can you rerun the tests ? I am getting some failures - in global_limit.q
        Hide
        Siying Dong added a comment -

        deleted the latest patch. The fetchTask return part is actually OK.

        Show
        Siying Dong added a comment - deleted the latest patch. The fetchTask return part is actually OK.
        Hide
        Siying Dong added a comment -

        found some problem with last modified piece of codes.

        Show
        Siying Dong added a comment - found some problem with last modified piece of codes.
        Hide
        Siying Dong added a comment -

        fix the issue. I think what Namit means is that the function should always return true(no more rows).

        Show
        Siying Dong added a comment - fix the issue. I think what Namit means is that the function should always return true(no more rows).
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/540/
        -----------------------------------------------------------

        (Updated 2011-04-15 18:37:21.441402)

        Review request for hive and namit jain.

        Changes
        -------

        fix a small logic bug.

        Summary
        -------

        For HIVE-2068

        This addresses bug HIVE-2068.
        https://issues.apache.org/jira/browse/HIVE-2068

        Diffs (updated)


        trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1091258
        trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1091258
        trunk/conf/hive-default.xml 1091258
        trunk/hwi/src/java/org/apache/hadoop/hive/hwi/HWISessionItem.java 1091258
        trunk/ql/src/java/org/apache/hadoop/hive/ql/CommandNeedRetryException.java PRE-CREATION
        trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java 1091258
        trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1091258
        trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 1091258
        trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java 1091258
        trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1091258
        trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 1091258
        trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SamplePruner.java 1091258
        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1091258
        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1091258
        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 1091258
        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1091258
        trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java 1091258
        trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/LimitDesc.java 1091258
        trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessor.java 1091258
        trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 1091258
        trunk/ql/src/test/queries/clientpositive/global_limit.q PRE-CREATION
        trunk/ql/src/test/results/clientpositive/global_limit.q.out PRE-CREATION
        trunk/service/src/java/org/apache/hadoop/hive/service/HiveServer.java 1091258

        Diff: https://reviews.apache.org/r/540/diff

        Testing
        -------

        added a test to test suite.

        Thanks,

        Siying

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/540/ ----------------------------------------------------------- (Updated 2011-04-15 18:37:21.441402) Review request for hive and namit jain. Changes ------- fix a small logic bug. Summary ------- For HIVE-2068 This addresses bug HIVE-2068 . https://issues.apache.org/jira/browse/HIVE-2068 Diffs (updated) trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1091258 trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1091258 trunk/conf/hive-default.xml 1091258 trunk/hwi/src/java/org/apache/hadoop/hive/hwi/HWISessionItem.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/CommandNeedRetryException.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SamplePruner.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/LimitDesc.java 1091258 trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessor.java 1091258 trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 1091258 trunk/ql/src/test/queries/clientpositive/global_limit.q PRE-CREATION trunk/ql/src/test/results/clientpositive/global_limit.q.out PRE-CREATION trunk/service/src/java/org/apache/hadoop/hive/service/HiveServer.java 1091258 Diff: https://reviews.apache.org/r/540/diff Testing ------- added a test to test suite. Thanks, Siying
        Hide
        Namit Jain added a comment -

        FetchTask: return false if number of rows found.
        Else, it looks good

        Show
        Namit Jain added a comment - FetchTask: return false if number of rows found. Else, it looks good
        Hide
        Namit Jain added a comment -

        Can you regenerate the patch ?
        I am getting a lot of conflicts

        Show
        Namit Jain added a comment - Can you regenerate the patch ? I am getting a lot of conflicts
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/540/
        -----------------------------------------------------------

        Review request for hive and namit jain.

        Summary
        -------

        For HIVE-2068

        This addresses bug HIVE-2068.
        https://issues.apache.org/jira/browse/HIVE-2068

        Diffs


        trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1086466
        trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1086466
        trunk/conf/hive-default.xml 1086466
        trunk/hwi/src/java/org/apache/hadoop/hive/hwi/HWISessionItem.java 1086466
        trunk/ql/src/java/org/apache/hadoop/hive/ql/CommandNeedRetryException.java PRE-CREATION
        trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java 1086466
        trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1086466
        trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 1086466
        trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java 1086466
        trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1086466
        trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 1086466
        trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SamplePruner.java 1086466
        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1086466
        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1086466
        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 1086466
        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1086466
        trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java 1086466
        trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/LimitDesc.java 1086466
        trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessor.java 1086466
        trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 1086466
        trunk/ql/src/test/queries/clientpositive/global_limit.q PRE-CREATION
        trunk/ql/src/test/results/clientpositive/global_limit.q.out PRE-CREATION
        trunk/service/src/java/org/apache/hadoop/hive/service/HiveServer.java 1086466

        Diff: https://reviews.apache.org/r/540/diff

        Testing
        -------

        added a test to test suite.

        Thanks,

        Siying

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/540/ ----------------------------------------------------------- Review request for hive and namit jain. Summary ------- For HIVE-2068 This addresses bug HIVE-2068 . https://issues.apache.org/jira/browse/HIVE-2068 Diffs trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1086466 trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1086466 trunk/conf/hive-default.xml 1086466 trunk/hwi/src/java/org/apache/hadoop/hive/hwi/HWISessionItem.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/CommandNeedRetryException.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SamplePruner.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/LimitDesc.java 1086466 trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessor.java 1086466 trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 1086466 trunk/ql/src/test/queries/clientpositive/global_limit.q PRE-CREATION trunk/ql/src/test/results/clientpositive/global_limit.q.out PRE-CREATION trunk/service/src/java/org/apache/hadoop/hive/service/HiveServer.java 1086466 Diff: https://reviews.apache.org/r/540/diff Testing ------- added a test to test suite. Thanks, Siying
        Hide
        Siying Dong added a comment -

        Namit, you can't see trunk/conf/hive-default.xml is already included in the diff of the review board?

        Show
        Siying Dong added a comment - Namit, you can't see trunk/conf/hive-default.xml is already included in the diff of the review board?
        Hide
        Namit Jain added a comment -

        Siying, I dont see the new changes

        Show
        Namit Jain added a comment - Siying, I dont see the new changes
        Hide
        Siying Dong added a comment -

        review-board updated.

        Show
        Siying Dong added a comment - review-board updated.
        Hide
        Namit Jain added a comment -

        can you update the review-board entry ?

        Show
        Namit Jain added a comment - can you update the review-board entry ?
        Hide
        Siying Dong added a comment -

        addressing Namit's comments.

        Show
        Siying Dong added a comment - addressing Namit's comments.
        Hide
        Namit Jain added a comment -

        comments in review-board

        Show
        Namit Jain added a comment - comments in review-board
        Show
        Siying Dong added a comment - https://reviews.apache.org/r/540/diff/
        Hide
        Siying Dong added a comment -

        previous patch missed a file.

        Show
        Siying Dong added a comment - previous patch missed a file.
        Hide
        Namit Jain added a comment -

        Can you add a review board entry ?

        Show
        Namit Jain added a comment - Can you add a review board entry ?
        Hide
        Siying Dong added a comment -

        Features are mostly finished and I did some manual tests.
        I'm still running all the tests. I'm also thinking of how to add tests to cover the Driver changes with retry.

        Show
        Siying Dong added a comment - Features are mostly finished and I did some manual tests. I'm still running all the tests. I'm also thinking of how to add tests to cover the Driver changes with retry.

          People

          • Assignee:
            Siying Dong
            Reporter:
            Siying Dong
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development