Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2932

Extend DistributedPlanner::createHashJoinFragment to account for hash table build cost

    XMLWordPrintableJSON

    Details

      Description

      Currently the decision to do a shuffle Vs. broadcast join doesn't take into account the CPU/memory cost for the hash table build.
      The lack of cost for hash table build results in large broadcast joins on big clusters which ends up being less efficient than a shuffle join.

      Other factors that need to be taken into account are :

      • Query memory limit (from runtime or MEM_LIMIT)
      • Number of concurrent fragments in the query and memory requirement by each fragment
      • Number of broadcast joins and shuffle joins in the current fragment

      This query can be used to verify the fix

      select d_year AS cs_sold_year,
          sum(cs_quantity) cs_qty,
          sum(cs_wholesale_cost) cs_wc,
          sum(cs_sales_price) cs_sp
         from catalog_sales
         left  join catalog_returns on cr_order_number=cs_order_number and cs_item_sk=cr_item_sk
         join date_dim on cs_sold_date_sk = d_date_sk
         where cr_order_number is null
         group by d_year
      

      Where the baseline is

      select /* +straight_join */ d_year AS cs_sold_year,
          sum(cs_quantity) cs_qty,
          sum(cs_wholesale_cost) cs_wc,
          sum(cs_sales_price) cs_sp
         from catalog_sales
         left  join  /* +shuffle */   catalog_returns on cr_order_number=cs_order_number and cs_item_sk=cr_item_sk
         join date_dim on cs_sold_date_sk = d_date_sk
         where cr_order_number is null
         group by d_year
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                twmarshall Thomas Tauber-Marshall
                Reporter:
                mmokhtar Mostafa Mokhtar
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: