Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Running the following on MiniTez
set hive.mapred.mode=nonstrict; SET hive.vectorized.execution.enabled=true; SET hive.exec.orc.default.buffer.size=32768; SET hive.exec.orc.default.row.index.stride=1000; SET hive.optimize.index.filter=true; set hive.fetch.task.conversion=none; set hive.exec.dynamic.partition.mode=nonstrict; DROP TABLE orc_a; DROP TABLE orc_b; CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q smallint) CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; CREATE TABLE orc_b (id bigint, cfloat float) CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; insert into table orc_a partition (y=2000, q) select cbigint, cdouble, csmallint % 10 from alltypesorc where cbigint is not null and csmallint > 0 order by cbigint asc; insert into table orc_a partition (y=2001, q) select cbigint, cdouble, csmallint % 10 from alltypesorc where cbigint is not null and csmallint > 0 order by cbigint asc; insert into table orc_b select cbigint, cfloat from alltypesorc where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; set hive.cbo.enable=false; select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; set hive.enforce.sortmergebucketmapjoin=false; set hive.optimize.bucketmapjoin=true; set hive.optimize.bucketmapjoin.sortedmerge=true; set hive.auto.convert.sortmerge.join=true; set hive.auto.convert.join=true; set hive.auto.convert.join.noconditionaltask.size=10; explain select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; DROP TABLE orc_a; DROP TABLE orc_b;
Produces different results for the two selects. The SMB one looks incorrect. cc djaiswal hagleitn
Attachments
Attachments
Issue Links
- is duplicated by
-
HIVE-16791 Tez engine giving inaccurate results on SMB Map joins while map-join and shuffle join gets correct results
- Resolved
- relates to
-
HIVE-16981 hive.optimize.bucketingsorting should compare the schema before removing RS
- Closed
-
HIVE-16761 LLAP IO: SMB joins fail elevator
- Closed