Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 4.1.0, Impala 4.1.1
-
ghx-label-11
Description
Found this bug when doing a large scale TPC-H benchmark. The bug can be reproduced by the following query:
use tpch_orc_def; set enabled_runtime_filter_types=in_list; select count(*) from supplier, nation, region where s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE';
The result is 0 which is wrong. The expected result is 1987. The summary shows that ScanNode on "nation" table returns 0 rows:
04:HASH JOIN 1 1 445.629us 445.629us 0 2.00K 1.98 MB 1.94 MB INNER JOIN, BROADCAST |--07:EXCHANGE 1 1 40.466us 40.466us 1 1 16.00 KB 16.00 KB BROADCAST | F02:EXCHANGE SENDER 1 1 217.341us 217.341us 8.60 KB 99.20 KB | 02:SCAN HDFS 1 1 4.507ms 4.507ms 1 1 917.09 KB 96.00 MB tpch_orc_def.region 03:HASH JOIN 1 1 2.112ms 2.112ms 0 10.00K 1.97 MB 1.94 MB INNER JOIN, BROADCAST |--06:EXCHANGE 1 1 27.803us 27.803us 0 25 0 16.00 KB BROADCAST | F01:EXCHANGE SENDER 1 1 89.872us 89.872us 25.59 KB 32.00 KB | 01:SCAN HDFS 1 1 12.833ms 12.833ms 0 25 32.00 KB 64.00 MB tpch_orc_def.nation 00:SCAN HDFS 1 1 371.636us 371.636us 0 10.00K 16.00 KB 32.00 MB tpch_orc_def.supplier
There is a runtime IN-list filter applied on this node:
01:SCAN HDFS [tpch_orc_def.nation, RANDOM] HDFS partitions=1/1 files=1 size=1.74KB runtime filters: RF000[in_list] -> n_regionkey stored statistics: table: rows=25 size=1.74KB columns: all extrapolated-rows=disabled max-scan-range-rows=25 mem-estimate=64.00MB mem-reservation=32.00KB thread-reservation=1 tuple-ids=1 row-size=4B cardinality=25 in pipelines: 01(GETNEXT)
The filter is generated from a build side which is reading the "region" table which predicate "r_name = 'EUROPE'". Note that it's a global runtime filter generated by other impalads (not the impalad scanning the "nation" table).
The profile shows that this filter rejects one file which is the exact one file of "nation" table.
Filter 0 (2.00 KB): - Files processed: 1 (1) - Files rejected: 1 (1) - Files total: 1 (1)
This is wrong since at least 5 rows in the file should pass the filter:
impala-shell> select count(*) from nation, region where n_regionkey = r_regionkey and r_name = 'EUROPE';
+----------+
| count(*) |
+----------+
| 5 |
+----------+
Attachments
Issue Links
- is caused by
-
IMPALA-11141 Use exact data types in IN-list filter
-
- Resolved
-
- links to