Hive
  1. Hive
  2. HIVE-3289

sort merge join may not work silently

    Details

    • Release Note:
      This patch adds the configuration property 'hive.enforce.sortmergebucketmapjoin', which is set to false by default.

      Description

      The user does not know, if the sort-merge join is working or not.

      create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc)
      INTO 1 BUCKETS STORED AS RCFILE;
      create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc)
      INTO 1 BUCKETS STORED AS RCFILE;

      set hive.enforce.sorting = true;

      insert overwrite table table_asc select key, value from src;
      insert overwrite table table_desc select key, value from src;

      set hive.optimize.bucketmapjoin = true;
      set hive.optimize.bucketmapjoin.sortedmerge = true;
      set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;

      explain
      select /+mapjoin(a)/ * from table_asc a join table_desc b on a.key = b.key;
      select /+mapjoin(a)/ * from table_asc a join table_desc b on a.key = b.key;

      explain
      select /+mapjoin(b)/ * from table_asc a join table_desc b on a.key = b.key;
      select /+mapjoin(b)/ * from table_asc a join table_desc b on a.key = b.key;

      In the above test, the sort-merge join is not obeyed as expected.
      If you user explicitly asked for sort-merge join, and it is not being
      obeyed, the operation should fail.

        Issue Links

          Activity

          Show
          Namit Jain added a comment - https://reviews.facebook.net/D4179
          Hide
          Kevin Wilfong added a comment -

          +1 running tests

          Show
          Kevin Wilfong added a comment - +1 running tests
          Hide
          Carl Steinbach added a comment - - edited

          -1

          I also am not a fan of hive.mapred.mode If you turn it off, you may unintentionally turn off other checks, and it uses strict/nonstrict instead of true/false which is easier to validate. That's, at best, a problem for another JIRA, though, as it's fairly well established.

          I agree with Kevin, but I don't think this should be postponed for another JIRA. Please add a new configuration property now instead of further overloading what is an already ill-defined and poorly documented configuration property.

          Show
          Carl Steinbach added a comment - - edited -1 I also am not a fan of hive.mapred.mode If you turn it off, you may unintentionally turn off other checks, and it uses strict/nonstrict instead of true/false which is easier to validate. That's, at best, a problem for another JIRA, though, as it's fairly well established. I agree with Kevin, but I don't think this should be postponed for another JIRA. Please add a new configuration property now instead of further overloading what is an already ill-defined and poorly documented configuration property.
          Hide
          Carl Steinbach added a comment -

          Two more points which are tangentially related:

          • The patch is not attached to this ticket, and it looks like Phabricator stopped automatically attaching patches some time ago. Is anyone at Facebook looking into fixing this?
          • Part of the agreement when we started using Phabricator was that the tool would automatically copy review comments back to JIRA. This feature hasn't worked in months, and unless it starts working soon I think we should stop using Phabricator and switch back to ReviewBoard. Is anyone looking into fixing this? If not we should probably just switch back now.
          Show
          Carl Steinbach added a comment - Two more points which are tangentially related: The patch is not attached to this ticket, and it looks like Phabricator stopped automatically attaching patches some time ago. Is anyone at Facebook looking into fixing this? Part of the agreement when we started using Phabricator was that the tool would automatically copy review comments back to JIRA. This feature hasn't worked in months, and unless it starts working soon I think we should stop using Phabricator and switch back to ReviewBoard. Is anyone looking into fixing this? If not we should probably just switch back now.
          Hide
          Namit Jain added a comment -

          https://reviews.facebook.net/D4377

          Added a new conf. parameter

          Show
          Namit Jain added a comment - https://reviews.facebook.net/D4377 Added a new conf. parameter
          Hide
          Namit Jain added a comment -

          I think, the discussion to use phabricator/review board/patch should be done on the dev mailing list, instead of this jira.

          Show
          Namit Jain added a comment - I think, the discussion to use phabricator/review board/patch should be done on the dev mailing list, instead of this jira.
          Hide
          Kevin Wilfong added a comment -

          Regarding the diff, I'm +1 on it, Carl?

          Show
          Kevin Wilfong added a comment - Regarding the diff, I'm +1 on it, Carl?
          Hide
          Carl Steinbach added a comment -

          +1. Thanks for making these changes.

          Show
          Carl Steinbach added a comment - +1. Thanks for making these changes.
          Hide
          Kevin Wilfong added a comment -

          Committed, thanks Namit.

          Show
          Kevin Wilfong added a comment - Committed, thanks Namit.
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #1584 (See https://builds.apache.org/job/Hive-trunk-h0.21/1584/)
          HIVE-3289. sort merge join may not work silently. (njain via kevinwilfong) (Revision 1368119)

          Result = FAILURE
          kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368119
          Files :

          • /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          • /hive/trunk/conf/hive-default.xml.template
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedMergeBucketMapJoinOptimizer.java
          • /hive/trunk/ql/src/test/queries/clientnegative/sortmerge_mapjoin_mismatch_1.q
          • /hive/trunk/ql/src/test/results/clientnegative/sortmerge_mapjoin_mismatch_1.q.out
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #1584 (See https://builds.apache.org/job/Hive-trunk-h0.21/1584/ ) HIVE-3289 . sort merge join may not work silently. (njain via kevinwilfong) (Revision 1368119) Result = FAILURE kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368119 Files : /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java /hive/trunk/conf/hive-default.xml.template /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedMergeBucketMapJoinOptimizer.java /hive/trunk/ql/src/test/queries/clientnegative/sortmerge_mapjoin_mismatch_1.q /hive/trunk/ql/src/test/results/clientnegative/sortmerge_mapjoin_mismatch_1.q.out
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
          HIVE-3289. sort merge join may not work silently. (njain via kevinwilfong) (Revision 1368119)

          Result = ABORTED
          kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368119
          Files :

          • /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          • /hive/trunk/conf/hive-default.xml.template
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedMergeBucketMapJoinOptimizer.java
          • /hive/trunk/ql/src/test/queries/clientnegative/sortmerge_mapjoin_mismatch_1.q
          • /hive/trunk/ql/src/test/results/clientnegative/sortmerge_mapjoin_mismatch_1.q.out
          Show
          Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-3289 . sort merge join may not work silently. (njain via kevinwilfong) (Revision 1368119) Result = ABORTED kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368119 Files : /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java /hive/trunk/conf/hive-default.xml.template /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedMergeBucketMapJoinOptimizer.java /hive/trunk/ql/src/test/queries/clientnegative/sortmerge_mapjoin_mismatch_1.q /hive/trunk/ql/src/test/results/clientnegative/sortmerge_mapjoin_mismatch_1.q.out
          Hide
          Ashutosh Chauhan added a comment -

          This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

          Show
          Ashutosh Chauhan added a comment - This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

            People

            • Assignee:
              Namit Jain
              Reporter:
              Namit Jain
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development