Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2932

Extend DistributedPlanner::createHashJoinFragment to account for hash table build cost

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Currently the decision to do a shuffle Vs. broadcast join doesn't take into account the CPU/memory cost for the hash table build.
      The lack of cost for hash table build results in large broadcast joins on big clusters which ends up being less efficient than a shuffle join.

      Other factors that need to be taken into account are :

      • Query memory limit (from runtime or MEM_LIMIT)
      • Number of concurrent fragments in the query and memory requirement by each fragment
      • Number of broadcast joins and shuffle joins in the current fragment

      This query can be used to verify the fix

      select d_year AS cs_sold_year,
          sum(cs_quantity) cs_qty,
          sum(cs_wholesale_cost) cs_wc,
          sum(cs_sales_price) cs_sp
         from catalog_sales
         left  join catalog_returns on cr_order_number=cs_order_number and cs_item_sk=cr_item_sk
         join date_dim on cs_sold_date_sk = d_date_sk
         where cr_order_number is null
         group by d_year
      

      Where the baseline is

      select /* +straight_join */ d_year AS cs_sold_year,
          sum(cs_quantity) cs_qty,
          sum(cs_wholesale_cost) cs_wc,
          sum(cs_sales_price) cs_sp
         from catalog_sales
         left  join  /* +shuffle */   catalog_returns on cr_order_number=cs_order_number and cs_item_sk=cr_item_sk
         join date_dim on cs_sold_date_sk = d_date_sk
         where cr_order_number is null
         group by d_year
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            twmarshall Thomas Tauber-Marshall
            mmokhtar Mostafa Mokhtar
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment