Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7227

TPCDS queries 47, 57, 59 fail to run with Statistics enabled at sf100

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.16.0
    • Fix Version/s: 1.17.0
    • Component/s: Metadata
    • Labels:
      None

      Description

      Here is query 78:

      WITH ws 
           AS (SELECT d_year                 AS ws_sold_year, 
                      ws_item_sk, 
                      ws_bill_customer_sk    ws_customer_sk, 
                      Sum(ws_quantity)       ws_qty, 
                      Sum(ws_wholesale_cost) ws_wc, 
                      Sum(ws_sales_price)    ws_sp 
               FROM   web_sales 
                      LEFT JOIN web_returns 
                             ON wr_order_number = ws_order_number 
                                AND ws_item_sk = wr_item_sk 
                      JOIN date_dim 
                        ON ws_sold_date_sk = d_date_sk 
               WHERE  wr_order_number IS NULL 
               GROUP  BY d_year, 
                         ws_item_sk, 
                         ws_bill_customer_sk), 
           cs 
           AS (SELECT d_year                 AS cs_sold_year, 
                      cs_item_sk, 
                      cs_bill_customer_sk    cs_customer_sk, 
                      Sum(cs_quantity)       cs_qty, 
                      Sum(cs_wholesale_cost) cs_wc, 
                      Sum(cs_sales_price)    cs_sp 
               FROM   catalog_sales 
                      LEFT JOIN catalog_returns 
                             ON cr_order_number = cs_order_number 
                                AND cs_item_sk = cr_item_sk 
                      JOIN date_dim 
                        ON cs_sold_date_sk = d_date_sk 
               WHERE  cr_order_number IS NULL 
               GROUP  BY d_year, 
                         cs_item_sk, 
                         cs_bill_customer_sk), 
           ss 
           AS (SELECT d_year                 AS ss_sold_year, 
                      ss_item_sk, 
                      ss_customer_sk, 
                      Sum(ss_quantity)       ss_qty, 
                      Sum(ss_wholesale_cost) ss_wc, 
                      Sum(ss_sales_price)    ss_sp 
               FROM   store_sales 
                      LEFT JOIN store_returns 
                             ON sr_ticket_number = ss_ticket_number 
                                AND ss_item_sk = sr_item_sk 
                      JOIN date_dim 
                        ON ss_sold_date_sk = d_date_sk 
               WHERE  sr_ticket_number IS NULL 
               GROUP  BY d_year, 
                         ss_item_sk, 
                         ss_customer_sk) 
      SELECT ss_item_sk, 
                     Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2) ratio, 
                     ss_qty                                              store_qty, 
                     ss_wc 
                     store_wholesale_cost, 
                     ss_sp 
                     store_sales_price, 
                     COALESCE(ws_qty, 0) + COALESCE(cs_qty, 0) 
                     other_chan_qty, 
                     COALESCE(ws_wc, 0) + COALESCE(cs_wc, 0) 
                     other_chan_wholesale_cost, 
                     COALESCE(ws_sp, 0) + COALESCE(cs_sp, 0) 
                     other_chan_sales_price 
      FROM   ss 
             LEFT JOIN ws 
                    ON ( ws_sold_year = ss_sold_year 
                         AND ws_item_sk = ss_item_sk 
                         AND ws_customer_sk = ss_customer_sk ) 
             LEFT JOIN cs 
                    ON ( cs_sold_year = ss_sold_year 
                         AND cs_item_sk = cs_item_sk 
                         AND cs_customer_sk = ss_customer_sk ) 
      WHERE  COALESCE(ws_qty, 0) > 0 
             AND COALESCE(cs_qty, 0) > 0 
             AND ss_sold_year = 1999 
      ORDER  BY ss_item_sk, 
                ss_qty DESC, 
                ss_wc DESC, 
                ss_sp DESC, 
                other_chan_qty, 
                other_chan_wholesale_cost, 
                other_chan_sales_price, 
                Round(ss_qty / ( COALESCE(ws_qty + cs_qty, 1) ), 2)
      LIMIT 100; 
      

      The profile for the new plan is 2338ae93-155b-356d-382e-0da949c6f439. Hash partition sender operator (10-00) takes 10-15 minutes. I am not sure why it takes so long. It has 10 minor fragments sending to receiver (06-05), which has 62 minor fragments. But hash partition sender (16-00) has 10 minor fragments sending to receiver (12-06), which has 220 minor fragments, and there is no performance issue.

      The profile for the old plan is 23387ab0-cb1c-cd5e-449a-c9bcefc901c1. Both plans use the same commit. The old plan is created by disabling statistics.

      I have not included the plans in the Jira because Jira has a max of 32K.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                gparai Gautam Parai
                Reporter:
                rhou Robert Hou
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: