For the time being, this should be fine. As there might be other cases when the limit is not the top most operator, but still global limit optimization can be applied, I think it would be worth to create a follow-up to identify the global limit for a given query i.e. identifying which specific operators 'reset' global limit optimization. Could you do that?
+1
JesĂşs Camacho RodrĂguez
added a comment - - edited For the time being, this should be fine. As there might be other cases when the limit is not the top most operator, but still global limit optimization can be applied, I think it would be worth to create a follow-up to identify the global limit for a given query i.e. identifying which specific operators 'reset' global limit optimization. Could you do that?
+1
Hive QA
added a comment -
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12826279/HIVE-14590.1.patch
SUCCESS: +1 due to 1 test(s) being added or modified.
ERROR: -1 due to 5 failed/errored test(s), 10502 tests executed
Failed tests:
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union_null]
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1053/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1053/console
Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1053/
Messages:
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
This message is automatically generated.
ATTACHMENT ID: 12826279 - PreCommit-HIVE-MASTER-Build
In addition to union, this may happen in other scenarios also. So, better approach is to handle it at top level. Modified patch to reflect that.
Ashutosh Chauhan
added a comment - In addition to union, this may happen in other scenarios also. So, better approach is to handle it at top level. Modified patch to reflect that.
I was thinking that maybe a subquery nested within another subquery could happen, and we only pull up SortLimit if hive.optimize.limittranspose is on. Thus, I hold the +1; I think we need to not only check children, but rather traverse the subtree below the Union and check its descendants.
JesĂşs Camacho RodrĂguez
added a comment - - edited I was thinking that maybe a subquery nested within another subquery could happen, and we only pull up SortLimit if hive.optimize.limittranspose is on. Thus, I hold the +1; I think we need to not only check children, but rather traverse the subtree below the Union and check its descendants.
I think so too, since it doesn't make sense to have any operator executing any logic after limit has been applied. Although I can't find any specific piece of code which guarantees this.
Ashutosh Chauhan
added a comment - I think so too, since it doesn't make sense to have any operator executing any logic after limit has been applied. Although I can't find any specific piece of code which guarantees this.
In that case, could you confirm that it is guaranteed that the limit operator will be just below the union? I think so, if I remember correctly... If that is guaranteed, patch LGTM, +1
JesĂşs Camacho RodrĂguez
added a comment - - edited OK, thanks for the explanation, that makes sense.
In that case, could you confirm that it is guaranteed that the limit operator will be just below the union? I think so, if I remember correctly... If that is guaranteed, patch LGTM, +1
Problem is outerQueryLimit field is used to indicate limit on query. As a result fetch operator only fetches those many rows. Assume query is like (subq1 limit 5) union (subq2 limit 5). Because of this bug limit in subquery becomes global limit of 5 and instead of 10 rows only 5 are fetched.
In the proposed fix, during Union operator conversion we check if there is any limit operator in subtree and clear out this outqueryLimit in such a case.
Ashutosh Chauhan
added a comment - Problem is outerQueryLimit field is used to indicate limit on query. As a result fetch operator only fetches those many rows. Assume query is like (subq1 limit 5) union (subq2 limit 5). Because of this bug limit in subquery becomes global limit of 5 and instead of 10 rows only 5 are fetched.
In the proposed fix, during Union operator conversion we check if there is any limit operator in subtree and clear out this outqueryLimit in such a case.
Could you give me a bit of context on this (why it should be disabled if there is a limit below the union)? Shouldn't we actually check the value of that limit to know whether it is greater or less?
JesĂşs Camacho RodrĂguez
added a comment - - edited Could you give me a bit of context on this (why it should be disabled if there is a limit below the union)? Shouldn't we actually check the value of that limit to know whether it is greater or less?
SUCCESS: +1 due to 1 test(s) being added or modified.
ERROR: -1 due to 10 failed/errored test(s), 10444 tests executed Failed tests:
TestJdbcWithMiniKdcSQLAuthBinary - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union_null]
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout
Hive QA
added a comment -
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12824661/HIVE-14590.patch
SUCCESS: +1 due to 1 test(s) being added or modified.
ERROR: -1 due to 10 failed/errored test(s), 10444 tests executed
Failed tests:
TestJdbcWithMiniKdcSQLAuthBinary - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union_null]
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout
Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/945/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/945/console
Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-945/
Messages:
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
This message is automatically generated.
ATTACHMENT ID: 12824661 - PreCommit-HIVE-MASTER-Build
Pushed to master. Created HIVE-14703 as a follow-up.