Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3289

sort merge join may not work silently

    XMLWordPrintableJSON

Details

    • This patch adds the configuration property 'hive.enforce.sortmergebucketmapjoin', which is set to false by default.

    Description

      The user does not know, if the sort-merge join is working or not.

      create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc)
      INTO 1 BUCKETS STORED AS RCFILE;
      create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc)
      INTO 1 BUCKETS STORED AS RCFILE;

      set hive.enforce.sorting = true;

      insert overwrite table table_asc select key, value from src;
      insert overwrite table table_desc select key, value from src;

      set hive.optimize.bucketmapjoin = true;
      set hive.optimize.bucketmapjoin.sortedmerge = true;
      set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;

      explain
      select /+mapjoin(a)/ * from table_asc a join table_desc b on a.key = b.key;
      select /+mapjoin(a)/ * from table_asc a join table_desc b on a.key = b.key;

      explain
      select /+mapjoin(b)/ * from table_asc a join table_desc b on a.key = b.key;
      select /+mapjoin(b)/ * from table_asc a join table_desc b on a.key = b.key;

      In the above test, the sort-merge join is not obeyed as expected.
      If you user explicitly asked for sort-merge join, and it is not being
      obeyed, the operation should fail.

      Attachments

        1. hive.3289.1.patch
          14 kB
          Namit Jain

        Issue Links

          Activity

            People

              namit Namit Jain
              namit Namit Jain
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: