Hive
  1. Hive
  2. HIVE-6883

Dynamic partitioning optimization does not honor sort order or order by

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.13.0
    • Fix Version/s: 0.14.0, 0.13.1
    • Component/s: None
    • Labels:
      None

      Description

      HIVE-6455 patch does not honor sort order of the output table or order by of select statement. The reason for the former is numDistributionKey in ReduceSinkDesc is set wrongly. It doesn't take into account the sort columns, because of this RSOp sets the sort columns to null in Key. Since nulls are set in place of sort columns in Key, the sort columns in Value are not sorted.

      The other issue is ORDER BY columns are not honored during insertion. For example

      insert overwrite table over1k_part_orc partition(ds="foo", t) select si,i,b,f,t from over1k_orc where t is null or t=27 order by si;
      

      the select query performs order by on column 'si' in the first MR job. The following MR job (inserted by HIVE-6455), sorts the input data on dynamic partition column 't' without taking into account the already sorted 'si' column. This results in out of order insertion for 'si' column.

      1. HIVE-6883.1.patch
        1.46 MB
        Prasanth Jayachandran
      2. HIVE-6883.2.patch
        1.48 MB
        Prasanth Jayachandran
      3. HIVE-6883.3.patch
        1.48 MB
        Prasanth Jayachandran
      4. HIVE-6883-branch-0.13.3.patch
        1.79 MB
        Prasanth Jayachandran

        Issue Links

          Activity

          Hide
          Prasanth Jayachandran added a comment -

          The fix is small. I added more tests to verify sort by and order by cases. Also replicated the tests for vectorization and tez.

          Show
          Prasanth Jayachandran added a comment - The fix is small. I added more tests to verify sort by and order by cases. Also replicated the tests for vectorization and tez.
          Hide
          Prasanth Jayachandran added a comment -

          Review Board is flaky.. will upload the patch once it is back..

          Show
          Prasanth Jayachandran added a comment - Review Board is flaky.. will upload the patch once it is back..
          Hide
          Prasanth Jayachandran added a comment -

          Harish Butani this is a critical issue as HIVE-6455 fails to honor sort order in DDL.. Will it be possible to include this in 0.13?

          Show
          Prasanth Jayachandran added a comment - Harish Butani this is a critical issue as HIVE-6455 fails to honor sort order in DDL.. Will it be possible to include this in 0.13?
          Hide
          Prasanth Jayachandran added a comment -

          Attaching RB link

          Show
          Prasanth Jayachandran added a comment - Attaching RB link
          Hide
          Harish Butani added a comment - - edited

          I going to say, let's keep this off 0.13
          The user can turn of the dyn partition optimization in the case of sort/order

          Show
          Harish Butani added a comment - - edited I going to say, let's keep this off 0.13 The user can turn of the dyn partition optimization in the case of sort/order
          Hide
          Prasanth Jayachandran added a comment -

          Thanks. No problem. I will change the fix version then. And yes the user can turn off this optimization.

          Show
          Prasanth Jayachandran added a comment - Thanks. No problem. I will change the fix version then. And yes the user can turn off this optimization.
          Hide
          Prasanth Jayachandran added a comment -

          orc_analyze.q test was failing in hadoop-2. Due to inconsistency in between hadoop-1 and hadoop-2 added order by to the test cases.

          Show
          Prasanth Jayachandran added a comment - orc_analyze.q test was failing in hadoop-2. Due to inconsistency in between hadoop-1 and hadoop-2 added order by to the test cases.
          Hide
          Vikram Dixit K added a comment -

          +1 LGTM

          Show
          Vikram Dixit K added a comment - +1 LGTM
          Hide
          Harish Butani added a comment -

          +1 for 0.13

          Show
          Harish Butani added a comment - +1 for 0.13
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12639645/HIVE-6883.2.patch

          ERROR: -1 due to 3 failed/errored test(s), 5571 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2209/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2209/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 3 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12639645

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12639645/HIVE-6883.2.patch ERROR: -1 due to 3 failed/errored test(s), 5571 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2209/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2209/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed This message is automatically generated. ATTACHMENT ID: 12639645
          Hide
          Prasanth Jayachandran added a comment -

          Fixed infer_bucket_sort_dyn_part.q. Other failures are unrelated.

          Show
          Prasanth Jayachandran added a comment - Fixed infer_bucket_sort_dyn_part.q. Other failures are unrelated.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12639717/HIVE-6883.3.patch

          ERROR: -1 due to 3 failed/errored test(s), 5613 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16
          org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20
          org.apache.hive.jdbc.TestJdbcDriver2.testNewConnectionConfiguration
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2218/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2218/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 3 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12639717

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12639717/HIVE-6883.3.patch ERROR: -1 due to 3 failed/errored test(s), 5613 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20 org.apache.hive.jdbc.TestJdbcDriver2.testNewConnectionConfiguration Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2218/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/2218/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed This message is automatically generated. ATTACHMENT ID: 12639717
          Hide
          Prasanth Jayachandran added a comment -

          The tests are not related.

          Show
          Prasanth Jayachandran added a comment - The tests are not related.
          Hide
          Sergey Shelukhin added a comment -

          committed to trunk

          Show
          Sergey Shelukhin added a comment - committed to trunk
          Hide
          Prasanth Jayachandran added a comment -

          Sushanth Sowmyan fyi.. attaching patch for branch-0.13.

          Show
          Prasanth Jayachandran added a comment - Sushanth Sowmyan fyi.. attaching patch for branch-0.13.
          Hide
          Sushanth Sowmyan added a comment -

          Thanks for the update, Prashant, I'll test it now.

          Show
          Sushanth Sowmyan added a comment - Thanks for the update, Prashant, I'll test it now.
          Hide
          Sushanth Sowmyan added a comment -

          Thanks, Prashant, this merges correctly.

          Show
          Sushanth Sowmyan added a comment - Thanks, Prashant, this merges correctly.
          Hide
          Thejas M Nair added a comment -

          This jira has been fixed as part of 0.13.1 release. If you find further issues, please create a new jira and link it to this one.

          Show
          Thejas M Nair added a comment - This jira has been fixed as part of 0.13.1 release. If you find further issues, please create a new jira and link it to this one.
          Hide
          Zhichun Wu added a comment -

          @ Prasanth Jayachandran , this fix cause some problems when combine dynamic partitioning with group by. Consider the following case:

          CREATE TABLE `t1`(  `a` int,`b` string) PARTITIONED BY (`dt` string);
          create table src1 (
            `key` string,
            `val` string
          );
          explain insert overwrite table t1 partition(dt) select 1, "hello", "20140901" from src1 group by key;
          

          The key expressions of RS in Stage-2 are wrong. The part of the patch which using the parent RS's keyCols needs more changes.

           if (parentRSOpOrder != null && !parentRSOpOrder.isEmpty() && sortPositions.isEmpty()) {
                    newKeyCols.addAll(parentRSOp.getConf().getKeyCols());
                    orderStr += parentRSOpOrder;
                  }
          
          Show
          Zhichun Wu added a comment - @ Prasanth Jayachandran , this fix cause some problems when combine dynamic partitioning with group by. Consider the following case: CREATE TABLE `t1`( `a` int ,`b` string) PARTITIONED BY (`dt` string); create table src1 ( `key` string, `val` string ); explain insert overwrite table t1 partition(dt) select 1, "hello" , "20140901" from src1 group by key; The key expressions of RS in Stage-2 are wrong. The part of the patch which using the parent RS's keyCols needs more changes. if (parentRSOpOrder != null && !parentRSOpOrder.isEmpty() && sortPositions.isEmpty()) { newKeyCols.addAll(parentRSOp.getConf().getKeyCols()); orderStr += parentRSOpOrder; }
          Hide
          Prasanth Jayachandran added a comment -

          Zhichun Wu Thanks for reporting the issue. I will look at it and will post back a fix in a new jira.

          Show
          Prasanth Jayachandran added a comment - Zhichun Wu Thanks for reporting the issue. I will look at it and will post back a fix in a new jira.
          Hide
          Prasanth Jayachandran added a comment -

          Zhichun Wu HIVE-8162 has fix for the issue.

          Show
          Prasanth Jayachandran added a comment - Zhichun Wu HIVE-8162 has fix for the issue.
          Hide
          Zhichun Wu added a comment -

          @ Prasanth Jayachandran , thank you for the fix

          Show
          Zhichun Wu added a comment - @ Prasanth Jayachandran , thank you for the fix
          Hide
          Prasanth Jayachandran added a comment -

          Zhichun Wu The fix needs more revision wrt subqueries. Currently it doesn't preserve the ordering of records if any subqueries has order by. I am working on it and will post an update to that fix.

          Show
          Prasanth Jayachandran added a comment - Zhichun Wu The fix needs more revision wrt subqueries. Currently it doesn't preserve the ordering of records if any subqueries has order by. I am working on it and will post an update to that fix.

            People

            • Assignee:
              Prasanth Jayachandran
              Reporter:
              Prasanth Jayachandran
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development