Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9132 CBO: Calcite Operator To Hive Operator (Calcite Return Path)
  3. HIVE-14590

CBO (Calcite Return Path) Incorrect result set when limit is present in one of the union branches

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.0, 2.1.0
    • 2.2.0
    • Logical Optimizer
    • None

    Description

      Limit gets propagated outside union.

      Attachments

        1. HIVE-14590.1.patch
          5 kB
          Ashutosh Chauhan
        2. HIVE-14590.patch
          3 kB
          Ashutosh Chauhan

        Activity

          Pushed to master. Created HIVE-14703 as a follow-up.

          ashutoshc Ashutosh Chauhan added a comment - Pushed to master. Created HIVE-14703 as a follow-up.

          For the time being, this should be fine. As there might be other cases when the limit is not the top most operator, but still global limit optimization can be applied, I think it would be worth to create a follow-up to identify the global limit for a given query i.e. identifying which specific operators 'reset' global limit optimization. Could you do that?

          +1

          jcamacho JesĂşs Camacho RodrĂ­guez added a comment - - edited For the time being, this should be fine. As there might be other cases when the limit is not the top most operator, but still global limit optimization can be applied, I think it would be worth to create a follow-up to identify the global limit for a given query i.e. identifying which specific operators 'reset' global limit optimization. Could you do that? +1
          hiveqa Hive QA added a comment -

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12826279/HIVE-14590.1.patch

          SUCCESS: +1 due to 1 test(s) being added or modified.

          ERROR: -1 due to 5 failed/errored test(s), 10502 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
          org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union_null]
          org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
          

          Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1053/testReport
          Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1053/console
          Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1053/

          Messages:

          Executing org.apache.hive.ptest.execution.TestCheckPhase
          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 5 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12826279 - PreCommit-HIVE-MASTER-Build

          hiveqa Hive QA added a comment - Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12826279/HIVE-14590.1.patch SUCCESS: +1 due to 1 test(s) being added or modified. ERROR: -1 due to 5 failed/errored test(s), 10502 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union_null] org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1053/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1053/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1053/ Messages: Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed This message is automatically generated. ATTACHMENT ID: 12826279 - PreCommit-HIVE-MASTER-Build

          In addition to union, this may happen in other scenarios also. So, better approach is to handle it at top level. Modified patch to reflect that.

          ashutoshc Ashutosh Chauhan added a comment - In addition to union, this may happen in other scenarios also. So, better approach is to handle it at top level. Modified patch to reflect that.

          I was thinking that maybe a subquery nested within another subquery could happen, and we only pull up SortLimit if hive.optimize.limittranspose is on. Thus, I hold the +1; I think we need to not only check children, but rather traverse the subtree below the Union and check its descendants.

          jcamacho JesĂşs Camacho RodrĂ­guez added a comment - - edited I was thinking that maybe a subquery nested within another subquery could happen, and we only pull up SortLimit if hive.optimize.limittranspose is on. Thus, I hold the +1; I think we need to not only check children, but rather traverse the subtree below the Union and check its descendants.

          I think so too, since it doesn't make sense to have any operator executing any logic after limit has been applied. Although I can't find any specific piece of code which guarantees this.

          ashutoshc Ashutosh Chauhan added a comment - I think so too, since it doesn't make sense to have any operator executing any logic after limit has been applied. Although I can't find any specific piece of code which guarantees this.

          OK, thanks for the explanation, that makes sense.

          In that case, could you confirm that it is guaranteed that the limit operator will be just below the union? I think so, if I remember correctly... If that is guaranteed, patch LGTM, +1

          jcamacho JesĂşs Camacho RodrĂ­guez added a comment - - edited OK, thanks for the explanation, that makes sense. In that case, could you confirm that it is guaranteed that the limit operator will be just below the union? I think so, if I remember correctly... If that is guaranteed, patch LGTM, +1

          Problem is outerQueryLimit field is used to indicate limit on query. As a result fetch operator only fetches those many rows. Assume query is like (subq1 limit 5) union (subq2 limit 5). Because of this bug limit in subquery becomes global limit of 5 and instead of 10 rows only 5 are fetched.
          In the proposed fix, during Union operator conversion we check if there is any limit operator in subtree and clear out this outqueryLimit in such a case.

          ashutoshc Ashutosh Chauhan added a comment - Problem is outerQueryLimit field is used to indicate limit on query. As a result fetch operator only fetches those many rows. Assume query is like (subq1 limit 5) union (subq2 limit 5). Because of this bug limit in subquery becomes global limit of 5 and instead of 10 rows only 5 are fetched. In the proposed fix, during Union operator conversion we check if there is any limit operator in subtree and clear out this outqueryLimit in such a case.

          Could you give me a bit of context on this (why it should be disabled if there is a limit below the union)? Shouldn't we actually check the value of that limit to know whether it is greater or less?

          jcamacho JesĂşs Camacho RodrĂ­guez added a comment - - edited Could you give me a bit of context on this (why it should be disabled if there is a limit below the union)? Shouldn't we actually check the value of that limit to know whether it is greater or less?

          jcamachorodriguez Can you please review this?

          ashutoshc Ashutosh Chauhan added a comment - jcamachorodriguez Can you please review this?
          hiveqa Hive QA added a comment -

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12824661/HIVE-14590.patch

          SUCCESS: +1 due to 1 test(s) being added or modified.

          ERROR: -1 due to 10 failed/errored test(s), 10444 tests executed
          Failed tests:

          TestJdbcWithMiniKdcSQLAuthBinary - did not produce a TEST-*.xml file
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1]
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2]
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1]
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part2]
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1]
          org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union_null]
          org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
          org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout
          

          Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/945/testReport
          Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/945/console
          Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-945/

          Messages:

          Executing org.apache.hive.ptest.execution.TestCheckPhase
          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 10 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12824661 - PreCommit-HIVE-MASTER-Build

          hiveqa Hive QA added a comment - Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12824661/HIVE-14590.patch SUCCESS: +1 due to 1 test(s) being added or modified. ERROR: -1 due to 10 failed/errored test(s), 10444 tests executed Failed tests: TestJdbcWithMiniKdcSQLAuthBinary - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part2] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1] org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union_null] org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/945/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/945/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-945/ Messages: Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed This message is automatically generated. ATTACHMENT ID: 12824661 - PreCommit-HIVE-MASTER-Build

          Fixes union_null.q wrong results.
          cc: vgarg

          ashutoshc Ashutosh Chauhan added a comment - Fixes union_null.q wrong results. cc: vgarg

          People

            ashutoshc Ashutosh Chauhan
            ashutoshc Ashutosh Chauhan
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: