Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2502

Use equivalence classes when determining where to fragment a plan

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: Impala 2.3.0
    • Fix Version/s: impala 2.5.1
    • Component/s: Frontend
    • Labels:

      Description

      Equivalence classes can be used to avoid unnecessary exchanges when partitioning/fragmenting a plan. For example in TPC-H Q-13 the current plan includes

      F03:PLAN FRAGMENT [HASH(c_custkey)]
        DATASTREAM SINK [FRAGMENT=F04, EXCHANGE=10, HASH(c_count)]
        04:AGGREGATE
        |  output: count(*)
        |  group by: count(o_orderkey)
        |  hosts=10 per-host-mem=891.04MB
        |  tuple-ids=4 row-size=16B cardinality=53086384
        |
        09:AGGREGATE [FINALIZE]
        |  output: count:merge(o_orderkey)
        |  group by: c_custkey
        |  hosts=10 per-host-mem=89.10MB
        |  tuple-ids=2 row-size=16B cardinality=53086384
        |
        08:EXCHANGE [HASH(c_custkey)]
           hosts=10 per-host-mem=0B
           tuple-ids=2 row-size=16B cardinality=53086384
      
      F02:PLAN FRAGMENT [HASH(o_custkey)]
        DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=08, HASH(c_custkey)]
        03:AGGREGATE
        |  output: count(o_orderkey)
        |  group by: c_custkey
        |  hosts=10 per-host-mem=891.04MB
        |  tuple-ids=2 row-size=16B cardinality=53086384
        |
        02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
        |  hash predicates: o_custkey = c_custkey
        |  hosts=10 per-host-mem=37.77MB
        |  tuple-ids=1N,0 row-size=88B cardinality=405000000
        |
        |--07:EXCHANGE [HASH(c_custkey)]
        |     hosts=10 per-host-mem=0B
        |     tuple-ids=0 row-size=8B cardinality=45000000
        |
        06:EXCHANGE [HASH(o_custkey)]
           hosts=10 per-host-mem=0B
           tuple-ids=1 row-size=80B cardinality=405000000
      

      These two fragments have equivalent input partitioning since o_custkey = c_custkey. The fragment below shows what the might be done instead.

      F02:PLAN FRAGMENT [HASH(c_custkey)]
        DATASTREAM SINK [FRAGMENT=F04, EXCHANGE=10, HASH(c_count)]
        04:AGGREGATE
        |  output: count(*)
        |  group by: count(o_orderkey)
        |  hosts=10 per-host-mem=891.04MB
        |  tuple-ids=4 row-size=16B cardinality=53086384
        |
        09:AGGREGATE [FINALIZE]
        |  output: count(o_orderkey)
        |  group by: c_custkey
        |  hosts=10 per-host-mem=891.04MB
        |  tuple-ids=2 row-size=16B cardinality=53086384
        |
        02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
        |  hash predicates: o_custkey = c_custkey
        |  hosts=10 per-host-mem=37.77MB
        |  tuple-ids=1N,0 row-size=88B cardinality=405000000
        |
        |--07:EXCHANGE [HASH(c_custkey)]
        |     hosts=10 per-host-mem=0B
        |     tuple-ids=0 row-size=8B cardinality=45000000
        |
        06:EXCHANGE [HASH(o_custkey)]
           hosts=10 per-host-mem=0B
           tuple-ids=1 row-size=80B cardinality=405000000
      

        Attachments

          Activity

            People

            • Assignee:
              tarmstrong Tim Armstrong
              Reporter:
              caseyc casey
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: