Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.0
-
None
Description
In our test runs, we've seen TPCH-Q21 return incorrect results. I believe the reason is that an ANTI join could produce incorrect results when spilling.
Here are exec summaries from TPCH-Q21 runs that produced correct/incorrect results:
Run with correct results (no spilling):
21:MERGING-EXCHANGE 1 341.969us 341.969us 300 100 0 -1.00 B UNPARTITIONED 12:TOP-N 3 1.738ms 1.928ms 300 100 20.00 KB 4.10 KB 20:AGGREGATE 3 141.305ms 141.559ms 411 10.39K 6.29 MB 10.00 MB FINALIZE 19:EXCHANGE 3 163.412us 178.198us 1.20K 10.39K 0 0 HASH(s_name) 11:AGGREGATE 3 159.478ms 212.594ms 1.20K 10.39K 11.47 MB 10.00 MB 10:HASH JOIN 3 1s485ms 1s575ms 4.14K 600.12K 838.05 MB 14.27 MB LEFT ANTI JOIN, PARTITIONED |--18:EXCHANGE 3 293.828ms 308.780ms 3.79M 600.12K 0 0 HASH(l3.l_orderkey) | 05:SCAN HDFS 3 2s787ms 3s233ms 3.79M 600.12K 65.66 MB 264.00 MB tpch.lineitem l3 09:HASH JOIN 3 4s791ms 4s880ms 73.09K 600.12K 838.05 MB 33.58 MB LEFT SEMI JOIN, PARTITIONED |--17:EXCHANGE 3 380.799ms 406.256ms 6.00M 6.00M 0 0 HASH(l2.l_orderkey) | 04:SCAN HDFS 3 2s745ms 3s046ms 6.00M 6.00M 64.73 MB 264.00 MB tpch.lineitem l2 16:EXCHANGE 3 3.863ms 3.915ms 75.87K 600.12K 0 0 HASH(l1.l_orderkey) 08:HASH JOIN 3 387.206ms 400.723ms 75.87K 600.12K 8.59 MB 28.00 B INNER JOIN, BROADCAST |--15:EXCHANGE 3 15.843us 19.148us 1 1 0 0 BROADCAST | 03:SCAN HDFS 1 443.178ms 443.178ms 1 1 41.00 KB 32.00 MB tpch.nation 07:HASH JOIN 3 3s889ms 4s269ms 1.83M 600.12K 566.03 MB 13.11 MB INNER JOIN, BROADCAST |--14:EXCHANGE 3 141.48ms 144.758ms 729.41K 500.00K 0 0 BROADCAST | 02:SCAN HDFS 2 1s744ms 2s386ms 729.41K 500.00K 32.16 MB 176.00 MB tpch.orders 06:HASH JOIN 3 1s763ms 1s879ms 3.79M 600.12K 12.43 MB 472.66 KB INNER JOIN, BROADCAST |--13:EXCHANGE 3 3.112ms 5.549ms 10.00K 10.00K 0 0 BROADCAST | 00:SCAN HDFS 1 508.179ms 508.179ms 10.00K 10.00K 2.24 MB 32.00 MB tpch.supplier 01:SCAN HDFS 3 1s854ms 1s895ms 3.79M 600.12K 65.38 MB 264.00 MB tpch.lineitem l1
Run with incorrect results (nodes 9 and 10 spill):
21:MERGING-EXCHANGE 1 694.171us 694.171us 300 100 0 -1.00 B UNPARTITIONED 12:TOP-N 3 2.875ms 3.402ms 300 100 20.00 KB 4.10 KB 20:AGGREGATE 3 1s110ms 1s111ms 411 10.39K 6.29 MB 10.00 MB FINALIZE 19:EXCHANGE 3 183.944us 198.813us 1.21K 10.39K 0 0 HASH(s_name) 11:AGGREGATE 3 371.973ms 519.24ms 1.21K 10.39K 8.92 MB 10.00 MB 10:HASH JOIN 3 2s069ms 2s159ms 13.91K 600.12K 838.05 MB 14.27 MB LEFT ANTI JOIN, PARTITIONED |--18:EXCHANGE 3 409.707ms 413.441ms 3.79M 600.12K 0 0 HASH(l3.l_orderkey) | 05:SCAN HDFS 3 1m20s 1m34s 3.79M 600.12K 24.30 MB 264.00 MB tpch.lineitem l3 09:HASH JOIN 3 41s041ms 41s195ms 73.09K 600.12K 838.05 MB 33.58 MB LEFT SEMI JOIN, PARTITIONED |--17:EXCHANGE 3 521.547ms 530.812ms 6.00M 6.00M 0 0 HASH(l2.l_orderkey) | 04:SCAN HDFS 3 1m19s 1m33s 6.00M 6.00M 24.12 MB 264.00 MB tpch.lineitem l2 16:EXCHANGE 3 8.868ms 10.867ms 75.87K 600.12K 0 0 HASH(l1.l_orderkey) 08:HASH JOIN 3 388.671ms 410.5ms 75.87K 600.12K 8.59 MB 28.00 B INNER JOIN, BROADCAST |--15:EXCHANGE 3 35.973us 41.837us 1 1 0 0 BROADCAST | 03:SCAN HDFS 1 7s453ms 7s453ms 1 1 41.00 KB 32.00 MB tpch.nation 07:HASH JOIN 3 42s495ms 43s984ms 1.83M 600.12K 566.03 MB 13.11 MB INNER JOIN, BROADCAST |--14:EXCHANGE 3 186.685ms 188.864ms 729.41K 500.00K 0 0 BROADCAST | 02:SCAN HDFS 2 38s291ms 56s436ms 729.41K 500.00K 16.14 MB 176.00 MB tpch.orders 06:HASH JOIN 3 5s841ms 8s681ms 3.79M 600.12K 12.43 MB 472.66 KB INNER JOIN, BROADCAST |--13:EXCHANGE 3 4.79ms 5.164ms 10.00K 10.00K 0 0 BROADCAST | 00:SCAN HDFS 1 7s837ms 7s837ms 10.00K 10.00K 2.08 MB 32.00 MB tpch.supplier 01:SCAN HDFS 3 43s272ms 54s934ms 3.79M 600.12K 83.99 MB 264.00 MB tpch.lineitem l1
Notice that plan node 10 returns a very different number of rows for both of the runs. The full profile of the good/bad run are attached.
You can reproduce the problem locally by repeatedly running TPCH-Q21 from the shell. Eventually a run will spill and lead to incorrect results. You can track /memz to see how many more runs you need. One the consumption is close to the limit, spilling will trigger.