Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9382

Query got rerun with Global Limit optimization on and Fetch optimization off

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.14.0
    • Fix Version/s: 1.1.0
    • Component/s: Physical Optimizer
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      HIVE-9382: Fix Global Limit optimization when Fetch optimizations are off (Wei Zheng, reviewed by Gopal V)
      Show
      HIVE-9382 : Fix Global Limit optimization when Fetch optimizations are off (Wei Zheng, reviewed by Gopal V)

      Description

      When Global Limit optimization is enabled, and Fetch Optimization for Simple Queries is off or not applicable, some queries with LIMIT clause will run twice.
      set hive.limit.optimize.enable=true;
      set hive.fetch.task.conversion=none;

      For example,

      hive> select * from t1 limit 10;
      Query ID = wzheng_20150107185252_4a6d0e65-9e58-464b-9ed3-9177740c30a9
      Total jobs = 1
      Launching Job 1 out of 1
      
      
      Status: Running (Executing on YARN cluster with App id application_1420567249453_0039)
      
      --------------------------------------------------------------------------------
              VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
      --------------------------------------------------------------------------------
      Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
      --------------------------------------------------------------------------------
      VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 0.41 s
      --------------------------------------------------------------------------------
      OK
      201208	99848	119820	32627	982976	509206	0.000100898
      201208	99745	119820	32627	982976	509206	0.000100898
      201208	99739	119820	32627	982976	509206	0.000100898
      201208	99847	119820	32627	982976	509206	0.000100898
      201208	613588	119820	32627	982976	509206	0.000100898
      201208	99809	119820	32627	982976	509206	0.000100898
      201208	99725	119820	32627	982976	509206	0.000100898
      201208	99666	119820	32627	982976	509206	0.000100898
      201208	99743	119820	32627	982976	509206	0.000100898
      201208	99801	119820	32627	982976	509206	0.000100898
      Retry query with a different approach...
      Query ID = wzheng_20150107185252_8a77f793-cad7-4c6b-b64a-07d8310970b9
      Total jobs = 1
      Launching Job 1 out of 1
      
      
      Status: Running (Executing on YARN cluster with App id application_1420567249453_0039)
      
      --------------------------------------------------------------------------------
              VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
      --------------------------------------------------------------------------------
      Map 1 ..........   SUCCEEDED    309        309        0        0       0       0
      --------------------------------------------------------------------------------
      VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 2.04 s
      --------------------------------------------------------------------------------
      OK
      201208	99848	119820	32627	982976	509206	0.000100898
      201208	99745	119820	32627	982976	509206	0.000100898
      201208	99739	119820	32627	982976	509206	0.000100898
      201208	99847	119820	32627	982976	509206	0.000100898
      201208	613588	119820	32627	982976	509206	0.000100898
      201208	99809	119820	32627	982976	509206	0.000100898
      201208	99725	119820	32627	982976	509206	0.000100898
      201208	99666	119820	32627	982976	509206	0.000100898
      201208	99743	119820	32627	982976	509206	0.000100898
      201208	99801	119820	32627	982976	509206	0.000100898
      Time taken: 2.748 seconds, Fetched: 10 row(s)
      

        Activity

        Hide
        gopalv Gopal V added a comment -

        +1 LGTM.

        Show
        gopalv Gopal V added a comment - +1 LGTM.
        Hide
        wzheng Wei Zheng added a comment -

        The test failure is irrelevant.

        Show
        wzheng Wei Zheng added a comment - The test failure is irrelevant.
        Hide
        hiveqa Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12692410/HIVE-9382.1.patch

        ERROR: -1 due to 1 failed/errored test(s), 7315 tests executed
        Failed tests:

        org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2389/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2389/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2389/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 1 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12692410 - PreCommit-HIVE-TRUNK-Build

        Show
        hiveqa Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12692410/HIVE-9382.1.patch ERROR: -1 due to 1 failed/errored test(s), 7315 tests executed Failed tests: org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2389/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2389/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2389/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12692410 - PreCommit-HIVE-TRUNK-Build
        Hide
        wzheng Wei Zheng added a comment -

        Here's how to reproduce the problem:

        set hive.limit.optimize.enable=true;
        set hive.limit.optimize.limit.file=2;
        set hive.limit.row.max.size=100;
        set hive.fetch.task.conversion=none;
        
        create table t1 (key int, value string) stored as textfile;
        load data local inpath '../../data/files/srcbucket20.txt' INTO TABLE t1;
        load data local inpath '../../data/files/srcbucket20.txt' INTO TABLE t1;
        load data local inpath '../../data/files/srcbucket20.txt' INTO TABLE t1;
        
        select * from t1 limit 1;
        
        Show
        wzheng Wei Zheng added a comment - Here's how to reproduce the problem: set hive.limit.optimize.enable=true; set hive.limit.optimize.limit.file=2; set hive.limit.row.max.size=100; set hive.fetch.task.conversion=none; create table t1 (key int, value string) stored as textfile; load data local inpath '../../data/files/srcbucket20.txt' INTO TABLE t1; load data local inpath '../../data/files/srcbucket20.txt' INTO TABLE t1; load data local inpath '../../data/files/srcbucket20.txt' INTO TABLE t1; select * from t1 limit 1;
        Hide
        wzheng Wei Zheng added a comment -

        Uploaded 1st patch.

        Show
        wzheng Wei Zheng added a comment - Uploaded 1st patch.
        Hide
        wzheng Wei Zheng added a comment -

        The problem happens when there are already 10 rows fetched and output (in this example), and the FetchTask still tries to retrieve more rows since the least number of rows for each task is greater than 0. That is wrong, because we do not need any more rows. FetchTask.fetch should just return without doing anything.

        Show
        wzheng Wei Zheng added a comment - The problem happens when there are already 10 rows fetched and output (in this example), and the FetchTask still tries to retrieve more rows since the least number of rows for each task is greater than 0. That is wrong, because we do not need any more rows. FetchTask.fetch should just return without doing anything.
        Hide
        wzheng Wei Zheng added a comment -

        The second run for the query in fact turns off Global Limit optimization. That's why it doesn't continue on and on.

        Show
        wzheng Wei Zheng added a comment - The second run for the query in fact turns off Global Limit optimization. That's why it doesn't continue on and on.

          People

          • Assignee:
            wzheng Wei Zheng
            Reporter:
            wzheng Wei Zheng
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development