Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.2
-
None
Description
When a full outer join on the old (non-partitioned) HashJoinNode, if any join fragment has 0 build rows and 0 probe rows an extra null row will be produced;
In the following example running on a mini cluster with 3 impalads, the following query produces incorrect results because one join fragment instance has 0 build rows and 0 probe rows, and it produces an extra row with null tuples:
Query: select a.id, b.id
from
(select id
from functional.alltypestiny where id between 0 and 2) a
full outer join
(select id
from functional.alltypestiny where id between 1 and 3) b
on (a.id = b.id)
+------+------+
| id | id |
+------+------+
| NULL | NULL |
| 0 | NULL |
| 1 | 1 |
| 2 | 2 |
| NULL | 3 |
+------+------+
Fetched 5 row(s) in 0.96s
The first result (NULL, NULL) should not be there.
Query: explain select a.id, b.id from (select id from functional.alltypestiny where id between 0 and 2) a full outer join (select id from functional.alltypestiny where id between 1 and 3) b on (a.id = b.id) +--------------------------------------------------------------+ | Explain String | +--------------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=32.00MB VCores=2 | | | | F03:PLAN FRAGMENT [UNPARTITIONED] | | 05:EXCHANGE [UNPARTITIONED] | | hosts=3 per-host-mem=unavailable | | tuple-ids=0N,2N row-size=8B cardinality=2 | | | | F02:PLAN FRAGMENT [HASH(id)] | | DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=05, UNPARTITIONED] | | 02:HASH JOIN [FULL OUTER JOIN, PARTITIONED] | | | hash predicates: id = id | | | hosts=3 per-host-mem=1B | | | tuple-ids=0N,2N row-size=8B cardinality=2 | | | | | |--04:EXCHANGE [HASH(id)] | | | hosts=3 per-host-mem=0B | | | tuple-ids=2 row-size=4B cardinality=1 | | | | | 03:EXCHANGE [HASH(id)] | | hosts=3 per-host-mem=0B | | tuple-ids=0 row-size=4B cardinality=1 | | | | F01:PLAN FRAGMENT [RANDOM] | | DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=04, HASH(id)] | | 01:SCAN HDFS [functional.alltypestiny, RANDOM] | | partitions=4/4 files=4 size=460B | | predicates: id >= 1, id <= 3 | | table stats: 8 rows total | | column stats: all | | hosts=3 per-host-mem=32.00MB | | tuple-ids=2 row-size=4B cardinality=1 | | | | F00:PLAN FRAGMENT [RANDOM] | | DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=03, HASH(id)] | | 00:SCAN HDFS [functional.alltypestiny, RANDOM] | | partitions=4/4 files=4 size=460B | | predicates: id >= 0, id <= 2 | | table stats: 8 rows total | | column stats: all | | hosts=3 per-host-mem=32.00MB | | tuple-ids=0 row-size=4B cardinality=1 | +--------------------------------------------------------------+ Fetched 41 row(s) in 4.78s