Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4530

Sort node after exchange should start sorting after first RowBatch is received

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Backend

    Description

      Sort node after exchange doesn't start sorting until all data is received which add lots of latency to the query.
      Not clear if this optimization would still make sense for a Scan followed by a sort run using the same thread.

      Query

      insert into tpcds_1000_parquet.store_sales_insert  partition(ss_sold_date_sk, ss_quantity)  /*+ clustered*/
      select
      ss_sold_time_sk,
        ss_item_sk ,
        ss_customer_sk,
        ss_cdemo_sk,
        ss_hdemo_sk,
        ss_addr_sk,
        ss_store_sk,
        ss_promo_sk,
        ss_ticket_number ,
        ss_wholesale_cost ,
        ss_list_price ,
        ss_sales_price ,
        ss_ext_discount_amt ,
        ss_ext_sales_price ,
        ss_ext_wholesale_cost ,
        ss_ext_list_price ,
        ss_ext_tax ,
        ss_coupon_amt ,
        ss_net_paid ,
        ss_net_paid_inc_tax ,
        ss_net_profit,
        ss_sold_date_sk  , ss_quantity
      from   store_sales
      

      Plan

      WRITE TO HDFS [tpcds_1000_parquet.store_sales_insert, OVERWRITE=false, PARTITION-KEYS=(ss_sold_date_sk,ss_quantity)]
      |  partitions=180576
      |  hosts=15 per-host-mem=17.88GB
      |
      02:SORT
      |  order by: ss_sold_date_sk DESC NULLS LAST, ss_quantity DESC NULLS LAST
      |  hosts=15 per-host-mem=1.45GB
      |  tuple-ids=1 row-size=100B cardinality=2879987999
      |
      01:EXCHANGE [HASH(ss_sold_date_sk,ss_quantity)]
      |  hosts=15 per-host-mem=0B
      |  tuple-ids=0 row-size=100B cardinality=2879987999
      |
      00:SCAN HDFS [tpcds_1000_parquet.store_sales, RANDOM]
         partitions=1824/1824 files=1824 size=189.24GB
         table stats: 2879987999 rows total
         column stats: all
         hosts=15 per-host-mem=88.00MB
         tuple-ids=0 row-size=100B cardinality=2879987999
      

      Attachments

        Issue Links

          Activity

            People

              noemi Noemi Pap-Takacs
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated: