Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.8.0, Impala 2.9.0
Description
There is a bunch of interesting behaviour where limits are not applied correctly and the number of rows returned is not updated correctly. Most the issues are masked by the fact that the planner sticks a exchange on top of most joins, but these turn into correctness issues when num_nodes=1.
Note how these examples return 10 rows (the batch size) when the limit is set to 5.
[localhost:21000] > set batch_size=10; BATCH_SIZE set to 10 [localhost:21000] > set num_nodes=1; NUM_NODES set to 1 [localhost:21000] > select straight_join t1.id, t2.id from functional.alltypes t1 inner join functional.alltypes t2 on t1.id = t2.id limit 5;summary; Query: select straight_join t1.id, t2.id from functional.alltypes t1 inner join functional.alltypes t2 on t1.id = t2.id limit 5 Query submitted at: 2017-02-01 16:26:29 (Coordinator: http://tarmstrong-box:25000) Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=414f96c409fdeace:79ad287d00000000 +------+------+ | id | id | +------+------+ | 5460 | 5460 | | 5461 | 5461 | | 5462 | 5462 | | 5463 | 5463 | | 5464 | 5464 | | 5465 | 5465 | | 5466 | 5466 | | 5467 | 5467 | | 5468 | 5468 | | 5469 | 5469 | +------+------+ Fetched 10 row(s) in 1.04s +-----------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------+ | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +-----------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------+ | 02:HASH JOIN | 1 | 5.90ms | 5.90ms | 0 | 5 | 1.25 MB | 31.37 KB | INNER JOIN | | |--01:SCAN HDFS | 1 | 7.09ms | 7.09ms | 7.30K | 7.30K | 622.09 KB | 160.00 MB | functional.alltypes t2 | | 00:SCAN HDFS | 1 | 766.09ms | 766.09ms | 10 | 7.30K | 606.09 KB | 160.00 MB | functional.alltypes t1 | +-----------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------+
[localhost:21000] > select straight_join t1.id, t2.id from functional.alltypes t1 right join functional.alltypes t2 on t1.id = t2.int_col + 100000 limit 5;summary; Query: select straight_join t1.id, t2.id from functional.alltypes t1 right join functional.alltypes t2 on t1.id = t2.int_col + 100000 limit 5 Query submitted at: 2017-02-01 16:30:11 (Coordinator: http://tarmstrong-box:25000) Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=e485f8d3a8d2487:c8cb4f2100000000 +------+------+ | id | id | +------+------+ | NULL | 3953 | | NULL | 3943 | | NULL | 3933 | | NULL | 3923 | | NULL | 3913 | | NULL | 3903 | | NULL | 3893 | | NULL | 3883 | | NULL | 3873 | | NULL | 3863 | +------+------+ Fetched 10 row(s) in 1.05s +-----------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------+ | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +-----------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------+ | 02:HASH JOIN | 1 | 17.78ms | 17.78ms | 10 | 5 | 1.02 MB | 62.74 KB | RIGHT OUTER JOIN | | |--01:SCAN HDFS | 1 | 8.13ms | 8.13ms | 7.30K | 7.30K | 626.09 KB | 160.00 MB | functional.alltypes t2 | | 00:SCAN HDFS | 1 | 757.83ms | 757.83ms | 7.30K | 7.30K | 630.09 KB | 160.00 MB | functional.alltypes t1 | +-----------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------+
Attachments
Issue Links
- is depended upon by
-
IMPALA-3335 Allow single-node optimization with joins.
- Resolved