Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.10.0
-
None
-
This patch adds the configuration property 'hive.enforce.sortmergebucketmapjoin', which is set to false by default.
Description
The user does not know, if the sort-merge join is working or not.
create table table_asc(key int, value string) CLUSTERED BY (key) SORTED BY (key asc)
INTO 1 BUCKETS STORED AS RCFILE;
create table table_desc(key int, value string) CLUSTERED BY (key) SORTED BY (key desc)
INTO 1 BUCKETS STORED AS RCFILE;
set hive.enforce.sorting = true;
insert overwrite table table_asc select key, value from src;
insert overwrite table table_desc select key, value from src;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
explain
select /+mapjoin(a)/ * from table_asc a join table_desc b on a.key = b.key;
select /+mapjoin(a)/ * from table_asc a join table_desc b on a.key = b.key;
explain
select /+mapjoin(b)/ * from table_asc a join table_desc b on a.key = b.key;
select /+mapjoin(b)/ * from table_asc a join table_desc b on a.key = b.key;
In the above test, the sort-merge join is not obeyed as expected.
If you user explicitly asked for sort-merge join, and it is not being
obeyed, the operation should fail.