Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-680

Bad plan for TPCH-Q21 (duplicate/redundant partitioning of left semi-join fragment)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 1.2.1
    • Impala 1.2.2
    • None
    • None

    Description

      When moving from Hive computed column stats to Impala computed column stats (more accurate) the plan for TPCH-Q21 changed slightly. With updated column stats the left outer join was changed from broadcast to partitioned (which I believe is okay), but the plan change also split the left outer join and semi-join into different plan fragments. This left semi join fragment is now partitioned on l1.l_orderkey and then re-partitioned on the same l1.l_orderkey (via STREAM DATA SINK) which is unnecessary/redundant.

      Query: explain select s_name, count(*) as numwait from lineitem l1 join supplier s on (s.s_suppkey = l1.l_suppkey) join orders o on (o.o_orderkey = l1.l_orderkey) join nation n on (s.s_nationkey = n.n_nationkey) left semi join lineitem l2 on (l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey) left outer join lineitem l3 on (l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate) where l3.l_orderkey is null and n_name = 'SAUDI ARABIA' and o_orderstatus = 'F' group by s_name order by numwait desc, s_name limit 100
      +-----------------------------------------------------------+
      | Explain String                                            |
      +-----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=282.53MB VCores=4 |
      |                                                           |
      | PLAN FRAGMENT 0                                           |
      |   PARTITION: UNPARTITIONED                                |
      |                                                           |
      |   23:TOP-N                                                |
      |   |  order by: COUNT(*) DESC, s_name ASC                  |
      |   |  limit: 100                                           |
      |   |  cardinality: 100                                     |
      |   |  per-host memory: unavailable                         |
      |   |  tuple ids: 6                                         |
      |   |                                                       |
      |   22:EXCHANGE                                             |
      |      cardinality: 100                                     |
      |      per-host memory: unavailable                         |
      |      tuple ids: 6                                         |
      |                                                           |
      | PLAN FRAGMENT 1                                           |
      |   PARTITION: HASH_PARTITIONED: s_name                     |
      |                                                           |
      |   STREAM DATA SINK                                        |
      |     EXCHANGE ID: 22                                       |
      |     UNPARTITIONED                                         |
      |                                                           |
      |   12:TOP-N                                                |
      |   |  order by: COUNT(*) DESC, s_name ASC                  |
      |   |  limit: 100                                           |
      |   |  cardinality: 100                                     |
      |   |  per-host memory: 2.34KB                              |
      |   |  tuple ids: 6                                         |
      |   |                                                       |
      |   21:AGGREGATE (merge finalize)                           |
      |   |  output: SUM(COUNT(*))                                |
      |   |  group by: s_name                                     |
      |   |  cardinality: 10388                                   |
      |   |  per-host memory: 10.00MB                             |
      |   |  tuple ids: 6                                         |
      |   |                                                       |
      |   20:EXCHANGE                                             |
      |      cardinality: 20776                                   |
      |      per-host memory: 0B                                  |
      |      tuple ids: 6                                         |
      |                                                           |
      | PLAN FRAGMENT 2                                           |
      |   PARTITION: HASH_PARTITIONED: l1.l_orderkey              |
      |                                                           |
      |   STREAM DATA SINK                                        |
      |     EXCHANGE ID: 20                                       |
      |     HASH_PARTITIONED: s_name                              |
      |                                                           |
      |   11:AGGREGATE                                            |
      |   |  output: COUNT(*)                                     |
      |   |  group by: s_name                                     |
      |   |  cardinality: 10388                                   |
      |   |  per-host memory: 10.00MB                             |
      |   |  tuple ids: 6                                         |
      |   |                                                       |
      |   10:HASH JOIN                                            |
      |   |  join op: LEFT OUTER JOIN (PARTITIONED)               |
      |   |  hash predicates:                                     |
      |   |    l1.l_orderkey = l3.l_orderkey                      |
      |   |  other join predicates: l3.l_suppkey != l1.l_suppkey  |
      |   |  other predicates: l3.l_orderkey IS NULL              |
      |   |  cardinality: 126211                                  |
      |   |  per-host memory: 644.66KB                            |
      |   |  tuple ids: 0 1 2 3 4 5N                              |
      |   |                                                       |
      |   |----19:EXCHANGE                                        |
      |   |       cardinality: 600122                             |
      |   |       per-host memory: 0B                             |
      |   |       tuple ids: 5                                    |
      |   |                                                       |
      |   18:EXCHANGE                                             |
      |      cardinality: 317788                                  |
      |      per-host memory: 0B                                  |
      |      tuple ids: 0 1 2 3 4                                 |
      |                                                           |
      | PLAN FRAGMENT 3                                           |
      |   PARTITION: RANDOM                                       |
      |                                                           |
      |   STREAM DATA SINK                                        |
      |     EXCHANGE ID: 19                                       |
      |     HASH_PARTITIONED: l3.l_orderkey                       |
      |                                                           |
      |   9:SCAN HDFS                                             |
      |      table=tpch.lineitem #partitions=1/1 size=718.94MB    |
      |      predicates: l3.l_receiptdate > l3.l_commitdate       |
      |      table stats: 6001215 rows total                      |
      |      column stats: all                                    |
      |      cardinality: 600122                                  |
      |      per-host memory: 264.00MB                            |
      |      tuple ids: 5                                         |
      |                                                           |
      | PLAN FRAGMENT 4                                           |
      |   PARTITION: HASH_PARTITIONED: l1.l_orderkey              |
      |                                                           |
      |   STREAM DATA SINK                                        |
      |     EXCHANGE ID: 18                                       |
      |     HASH_PARTITIONED: l1.l_orderkey                       |
      |                                                           |
      |   8:HASH JOIN                                             |
      |   |  join op: LEFT SEMI JOIN (PARTITIONED)                |
      |   |  hash predicates:                                     |
      |   |    l1.l_orderkey = l2.l_orderkey                      |
      |   |  other predicates: l2.l_suppkey != l1.l_suppkey       |
      |   |  cardinality: 317788                                  |
      |   |  per-host memory: 6.30MB                              |
      |   |  tuple ids: 0 1 2 3 4                                 |
      |   |                                                       |
      |   |----17:EXCHANGE                                        |
      |   |       cardinality: 6001215                            |
      |   |       per-host memory: 0B                             |
      |   |       tuple ids: 4                                    |
      |   |                                                       |
      |   16:EXCHANGE                                             |
      |      cardinality: 80016                                   |
      |      per-host memory: 0B                                  |
      |      tuple ids: 0 1 2 3                                   |
      |                                                           |
      | PLAN FRAGMENT 5                                           |
      |   PARTITION: RANDOM                                       |
      |                                                           |
      |   STREAM DATA SINK                                        |
      |     EXCHANGE ID: 17                                       |
      |     HASH_PARTITIONED: l2.l_orderkey                       |
      |                                                           |
      |   7:SCAN HDFS                                             |
      |      table=tpch.lineitem #partitions=1/1 size=718.94MB    |
      |      table stats: 6001215 rows total                      |
      |      column stats: all                                    |
      |      cardinality: 6001215                                 |
      |      per-host memory: 264.00MB                            |
      |      tuple ids: 4                                         |
      |                                                           |
      | PLAN FRAGMENT 6                                           |
      |   PARTITION: RANDOM                                       |
      |                                                           |
      |   STREAM DATA SINK                                        |
      |     EXCHANGE ID: 16                                       |
      |     HASH_PARTITIONED: l1.l_orderkey                       |
      |                                                           |
      |   6:HASH JOIN                                             |
      |   |  join op: INNER JOIN (BROADCAST)                      |
      |   |  hash predicates:                                     |
      |   |    s.s_nationkey = n.n_nationkey                      |
      |   |  cardinality: 80016                                   |
      |   |  per-host memory: 4B                                  |
      |   |  tuple ids: 0 1 2 3                                   |
      |   |                                                       |
      |   |----15:EXCHANGE                                        |
      |   |       cardinality: 1                                  |
      |   |       per-host memory: 0B                             |
      |   |       tuple ids: 3                                    |
      |   |                                                       |
      |   4:HASH JOIN                                             |
      |   |  join op: INNER JOIN (BROADCAST)                      |
      |   |  hash predicates:                                     |
      |   |    l1.l_orderkey = o.o_orderkey                       |
      |   |  cardinality: 2000405                                 |
      |   |  per-host memory: 1.57MB                              |
      |   |  tuple ids: 0 1 2                                     |
      |   |                                                       |
      |   |----14:EXCHANGE                                        |
      |   |       cardinality: 500000                             |
      |   |       per-host memory: 0B                             |
      |   |       tuple ids: 2                                    |
      |   |                                                       |
      |   2:HASH JOIN                                             |
      |   |  join op: INNER JOIN (BROADCAST)                      |
      |   |  hash predicates:                                     |
      |   |    l1.l_suppkey = s.s_suppkey                         |
      |   |  cardinality: 6001215                                 |
      |   |  per-host memory: 32.23KB                             |
      |   |  tuple ids: 0 1                                       |
      |   |                                                       |
      |   |----13:EXCHANGE                                        |
      |   |       cardinality: 10000                              |
      |   |       per-host memory: 0B                             |
      |   |       tuple ids: 1                                    |
      |   |                                                       |
      |   0:SCAN HDFS                                             |
      |      table=tpch.lineitem #partitions=1/1 size=718.94MB    |
      |      table stats: 6001215 rows total                      |
      |      column stats: all                                    |
      |      cardinality: 6001215                                 |
      |      per-host memory: 264.00MB                            |
      |      tuple ids: 0                                         |
      |                                                           |
      | PLAN FRAGMENT 7                                           |
      |   PARTITION: RANDOM                                       |
      |                                                           |
      |   STREAM DATA SINK                                        |
      |     EXCHANGE ID: 15                                       |
      |     UNPARTITIONED                                         |
      |                                                           |
      |   5:SCAN HDFS                                             |
      |      table=tpch.nation #partitions=1/1 size=2.15KB        |
      |      predicates: n_name = 'SAUDI ARABIA'                  |
      |      table stats: 25 rows total                           |
      |      column stats: all                                    |
      |      cardinality: 1                                       |
      |      per-host memory: 32.00MB                             |
      |      tuple ids: 3                                         |
      |                                                           |
      | PLAN FRAGMENT 8                                           |
      |   PARTITION: RANDOM                                       |
      |                                                           |
      |   STREAM DATA SINK                                        |
      |     EXCHANGE ID: 14                                       |
      |     UNPARTITIONED                                         |
      |                                                           |
      |   3:SCAN HDFS                                             |
      |      table=tpch.orders #partitions=1/1 size=162.56MB      |
      |      predicates: o_orderstatus = 'F'                      |
      |      table stats: 1500000 rows total                      |
      |      column stats: all                                    |
      |      cardinality: 500000                                  |
      |      per-host memory: 176.00MB                            |
      |      tuple ids: 2                                         |
      |                                                           |
      | PLAN FRAGMENT 9                                           |
      |   PARTITION: RANDOM                                       |
      |                                                           |
      |   STREAM DATA SINK                                        |
      |     EXCHANGE ID: 13                                       |
      |     UNPARTITIONED                                         |
      |                                                           |
      |   1:SCAN HDFS                                             |
      |      table=tpch.supplier #partitions=1/1 size=1.33MB      |
      |      table stats: 10000 rows total                        |
      |      column stats: all                                    |
      |      cardinality: 10000                                   |
      |      per-host memory: 32.00MB                             |
      |      tuple ids: 1                                         |
      +-----------------------------------------------------------+
      

      Attachments

        Activity

          People

            alex.behm Alexander Behm
            lskuff Lenni Kuff
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: