Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24872 PTF vectorization coverage
  3. HIVE-24873

TPCDS query51 doesn't vectorize: Only PTF directly under reduce-shuffle is supported

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      EXPLAIN VECTORIZATION DETAIL WITH web_v1 as (
      select
        ws_item_sk item_sk, d_date,
        sum(sum(ws_sales_price))
            over (partition by ws_item_sk order by d_date rows between unbounded preceding and current row) cume_sales
      from web_sales
          ,date_dim
      where ws_sold_date_sk=d_date_sk
        and d_month_seq between 1214 and 1214+11
        and ws_item_sk is not NULL
      group by ws_item_sk, d_date),
      store_v1 as (
      select
        ss_item_sk item_sk, d_date,
        sum(sum(ss_sales_price))
            over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales
      from store_sales
          ,date_dim
      where ss_sold_date_sk=d_date_sk
        and d_month_seq between 1214 and 1214+11
        and ss_item_sk is not NULL
      group by ss_item_sk, d_date)
       select  *
      from (select item_sk
           ,d_date
           ,web_sales
           ,store_sales
           ,max(web_sales)
               over (partition by item_sk order by d_date rows between unbounded preceding and current row) web_cumulative
           ,max(store_sales)
               over (partition by item_sk order by d_date rows between unbounded preceding and current row) store_cumulative
           from (select case when web.item_sk is not null then web.item_sk else store.item_sk end item_sk
                       ,case when web.d_date is not null then web.d_date else store.d_date end d_date
                       ,web.cume_sales web_sales
                       ,store.cume_sales store_sales
                 from web_v1 web full outer join store_v1 store on (web.item_sk = store.item_sk
                                                                and web.d_date = store.d_date)
                )x )y
      where web_cumulative > store_cumulative
      order by item_sk
              ,d_date
      limit 100;
      
      Reducer 2
      notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is supported
      window functions:
          window function: GenericUDAFSumHiveDecimal
          window frame: ROWS PRECEDING(MAX)~CURRENT
      
      ...
      
      Reducer 8
      notVectorizedReason: PTF operator: Only PTF directly under reduce-shuffle is supported
      window functions:
        window function: GenericUDAFSumHiveDecimal
        window frame: ROWS PRECEDING(MAX)~CURRENT |
      

      The interesting part is:

      explain vectorization detail select
        ws_item_sk item_sk, d_date,
        sum(sum(ws_sales_price))
            over (partition by ws_item_sk order by d_date rows between unbounded preceding and current row) cume_sales
      from web_sales
          ,date_dim
      where ws_sold_date_sk=d_date_sk
        and d_month_seq between 1214 and 1214+11
        and ws_item_sk is not NULL
      group by ws_item_sk, d_date;
      

      the same applies to query63:

      ...
      ,avg(sum(ss_sales_price)) over (partition by i_manager_id) avg_monthly_sales
      ...
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            abstractdog László Bodor
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: