Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5343

Sort by Column(s) added as part of inserting into Kudu table is incorrect

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Not A Bug
    • Affects Version/s: None
    • Fix Version/s: Impala 2.9.0
    • Component/s: Frontend
    • Labels:
    • Epic Color:
      ghx-label-5

      Description

      The planner is including the KuduPartition(PARTITION_COLUMN) as part of the columns included in the sort by clause, The Sort should match the columns as in the primary key.

      Plan

      Query: explain insert into lineitem_kudu_ts  select * from lineitem_kudu
      | INSERT INTO KUDU [scan_primitives_tpch_3tb.lineitem_kudu_ts]                                                                                                                    |
      | |                                                                                                                                                                               |
      | 02:SORT                                                                                                                                                                         |
      | |  order by: KuduPartition(scan_primitives_tpch_3tb.lineitem_kudu.l_orderkey) ASC NULLS LAST, l_shipdate ASC NULLS LAST, l_orderkey ASC NULLS LAST, l_linenumber ASC NULLS LAST |
      | |                                                                                                                                                                               |
      | 01:EXCHANGE [KUDU(KuduPartition(scan_primitives_tpch_3tb.lineitem_kudu.l_orderkey))]                                                                                            |
      | |                                                                                                                                                                               |
      | 00:SCAN KUDU [scan_primitives_tpch_3tb.lineitem_kudu]                                                                                                                           |
      

      DDL

      [vd1302.halxg.cloudera.com:21000] > show create table scan_primitives_tpch_3tb.lineitem_kudu_ts;
      Query: show create table scan_primitives_tpch_3tb.lineitem_kudu_ts
       CREATE TABLE scan_primitives_tpch_3tb.lineitem_kudu_ts (                                                
         l_shipdate STRING NOT NULL ENCODING DICT_ENCODING COMPRESSION LZ4,                                    
         l_orderkey BIGINT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION LZ4,                                      
         l_linenumber BIGINT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION LZ4,                                    
         l_partkey BIGINT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION LZ4,                                       
         l_suppkey BIGINT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION LZ4,                                       
         l_quantity DOUBLE NULL ENCODING BIT_SHUFFLE COMPRESSION LZ4,                                          
         l_extendedprice DOUBLE NULL ENCODING PLAIN_ENCODING COMPRESSION LZ4,                                  
         l_discount DOUBLE NULL ENCODING BIT_SHUFFLE COMPRESSION LZ4,                                          
         l_tax DOUBLE NULL ENCODING BIT_SHUFFLE COMPRESSION LZ4,                                               
         l_returnflag STRING NULL ENCODING DICT_ENCODING COMPRESSION LZ4,                                      
         l_linestatus STRING NULL ENCODING DICT_ENCODING COMPRESSION LZ4,                                      
         l_commitdate TIMESTAMP NULL ENCODING BIT_SHUFFLE COMPRESSION LZ4,                                     
         l_receiptdate STRING NULL ENCODING DICT_ENCODING COMPRESSION LZ4,                                     
         l_shipinstruct STRING NULL ENCODING DICT_ENCODING COMPRESSION LZ4,                                    
         l_shipmode STRING NULL ENCODING DICT_ENCODING COMPRESSION LZ4,                                        
         l_comment STRING NULL ENCODING PLAIN_ENCODING COMPRESSION LZ4,                                        
         PRIMARY KEY (l_shipdate, l_orderkey, l_linenumber)                                                    
       )                                                                                                       
       PARTITION BY HASH (l_orderkey) PARTITIONS 140                                                           
       STORED AS KUDU                                                                                          
       TBLPROPERTIES ('kudu.master_addresses'='vd1301.halxg.cloudera.com:7051,vd1128.halxg.cloudera.com:7051') 
      

        Activity

        Hide
        mjacobs Matthew Jacobs added a comment -

        The plan and sort is correct, the reason the "KuduPartition" expr is there is because multiple partitions end up at a given sink fragment, and we want the rows inserted to kudu to be per-partition and then ordered by PK.

        Show
        mjacobs Matthew Jacobs added a comment - The plan and sort is correct, the reason the "KuduPartition" expr is there is because multiple partitions end up at a given sink fragment, and we want the rows inserted to kudu to be per-partition and then ordered by PK.
        Hide
        dknupp David Knupp added a comment -

        Reopening momentarily to change the resolution.

        Show
        dknupp David Knupp added a comment - Reopening momentarily to change the resolution.

          People

          • Assignee:
            twmarshall Thomas Tauber-Marshall
            Reporter:
            mmokhtar Mostafa Mokhtar
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development