Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4866

Hash join node does not apply limits correctly

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0, Impala 2.9.0
    • Fix Version/s: Impala 2.10.0
    • Component/s: Backend
    • Labels:

      Description

      There is a bunch of interesting behaviour where limits are not applied correctly and the number of rows returned is not updated correctly. Most the issues are masked by the fact that the planner sticks a exchange on top of most joins, but these turn into correctness issues when num_nodes=1.

      Note how these examples return 10 rows (the batch size) when the limit is set to 5.

      [localhost:21000] > set batch_size=10;
      BATCH_SIZE set to 10
      [localhost:21000] > set num_nodes=1;
      NUM_NODES set to 1
      [localhost:21000] > select straight_join t1.id, t2.id from functional.alltypes t1 inner join functional.alltypes t2 on t1.id = t2.id limit 5;summary;
      Query: select straight_join t1.id, t2.id from functional.alltypes t1 inner join functional.alltypes t2 on t1.id = t2.id limit 5
      Query submitted at: 2017-02-01 16:26:29 (Coordinator: http://tarmstrong-box:25000)
      Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=414f96c409fdeace:79ad287d00000000
      +------+------+
      | id   | id   |
      +------+------+
      | 5460 | 5460 |
      | 5461 | 5461 |
      | 5462 | 5462 |
      | 5463 | 5463 |
      | 5464 | 5464 |
      | 5465 | 5465 |
      | 5466 | 5466 |
      | 5467 | 5467 |
      | 5468 | 5468 |
      | 5469 | 5469 |
      +------+------+
      Fetched 10 row(s) in 1.04s
      +-----------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------+
      | Operator        | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem  | Est. Peak Mem | Detail                 |
      +-----------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------+
      | 02:HASH JOIN    | 1      | 5.90ms   | 5.90ms   | 0     | 5          | 1.25 MB   | 31.37 KB      | INNER JOIN             |
      | |--01:SCAN HDFS | 1      | 7.09ms   | 7.09ms   | 7.30K | 7.30K      | 622.09 KB | 160.00 MB     | functional.alltypes t2 |
      | 00:SCAN HDFS    | 1      | 766.09ms | 766.09ms | 10    | 7.30K      | 606.09 KB | 160.00 MB     | functional.alltypes t1 |
      +-----------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------+
      
      [localhost:21000] > select straight_join t1.id, t2.id from functional.alltypes t1 right join functional.alltypes t2 on t1.id = t2.int_col + 100000 limit 5;summary;
      Query: select straight_join t1.id, t2.id from functional.alltypes t1 right join functional.alltypes t2 on t1.id = t2.int_col + 100000 limit 5
      Query submitted at: 2017-02-01 16:30:11 (Coordinator: http://tarmstrong-box:25000)
      Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=e485f8d3a8d2487:c8cb4f2100000000
      +------+------+
      | id   | id   |
      +------+------+
      | NULL | 3953 |
      | NULL | 3943 |
      | NULL | 3933 |
      | NULL | 3923 |
      | NULL | 3913 |
      | NULL | 3903 |
      | NULL | 3893 |
      | NULL | 3883 |
      | NULL | 3873 |
      | NULL | 3863 |
      +------+------+
      Fetched 10 row(s) in 1.05s
      +-----------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------+
      | Operator        | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem  | Est. Peak Mem | Detail                 |
      +-----------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------+
      | 02:HASH JOIN    | 1      | 17.78ms  | 17.78ms  | 10    | 5          | 1.02 MB   | 62.74 KB      | RIGHT OUTER JOIN       |
      | |--01:SCAN HDFS | 1      | 8.13ms   | 8.13ms   | 7.30K | 7.30K      | 626.09 KB | 160.00 MB     | functional.alltypes t2 |
      | 00:SCAN HDFS    | 1      | 757.83ms | 757.83ms | 7.30K | 7.30K      | 630.09 KB | 160.00 MB     | functional.alltypes t1 |
      +-----------------+--------+----------+----------+-------+------------+-----------+---------------+------------------------+
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                anujphadke Anuj Phadke
                Reporter:
                tarmstrong Tim Armstrong
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: