[HIVE-16965] SMB join may produce incorrect results - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0
Component/s: None
Labels:
None

Description

Running the following on MiniTez

set hive.mapred.mode=nonstrict;
SET hive.vectorized.execution.enabled=true;
SET hive.exec.orc.default.buffer.size=32768;
SET hive.exec.orc.default.row.index.stride=1000;
SET hive.optimize.index.filter=true;
set hive.fetch.task.conversion=none;
set hive.exec.dynamic.partition.mode=nonstrict;

DROP TABLE orc_a;
DROP TABLE orc_b;

CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q smallint)
  CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
CREATE TABLE orc_b (id bigint, cfloat float)
  CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;

insert into table orc_a partition (y=2000, q)
select cbigint, cdouble, csmallint % 10 from alltypesorc
  where cbigint is not null and csmallint > 0 order by cbigint asc;
insert into table orc_a partition (y=2001, q)
select cbigint, cdouble, csmallint % 10 from alltypesorc
  where cbigint is not null and csmallint > 0 order by cbigint asc;

insert into table orc_b 
select cbigint, cfloat from alltypesorc
  where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;

set hive.cbo.enable=false;

select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;

set hive.enforce.sortmergebucketmapjoin=false;
set hive.optimize.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.auto.convert.sortmerge.join=true;
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask.size=10;

explain
select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;

DROP TABLE orc_a;
DROP TABLE orc_b;

Produces different results for the two selects. The SMB one looks incorrect. cc djaiswal hagleitn

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-16965.1.patch
24/Jul/17 18:03
4 kB
Deepak Jaiswal
HIVE-16965.2.patch
24/Jul/17 18:46
50 kB
Deepak Jaiswal
HIVE-16965.3.patch
24/Jul/17 21:34
36 kB
Deepak Jaiswal
HIVE-16965.4.patch
25/Jul/17 19:59
6 kB
Deepak Jaiswal
HIVE-16965.5.patch
27/Jul/17 17:44
6 kB
Deepak Jaiswal
HIVE-16965.6.patch
27/Jul/17 20:36
7 kB
Deepak Jaiswal
HIVE-16965.7.patch
27/Jul/17 21:21
7 kB
Deepak Jaiswal
HIVE-16965.8.patch
28/Jul/17 02:57
7 kB
Deepak Jaiswal

Issue Links

is duplicated by

HIVE-16791 Tez engine giving inaccurate results on SMB Map joins while map-join and shuffle join gets correct results

Resolved

relates to

HIVE-16981 hive.optimize.bucketingsorting should compare the schema before removing RS

Closed

HIVE-16761 LLAP IO: SMB joins fail elevator

Closed

Activity

People

Assignee:: Deepak Jaiswal

Reporter:: Sergey Shelukhin

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 26/Jun/17 23:00

Updated:: 22/May/18 23:58

Resolved:: 28/Jul/17 21:11