Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4034

Add table property to indicate Sort by column

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • Impala 2.8.0
    • None
    • Backend

    Description

      Store data on disk in a "locally" sorted order according to the sort columns specified in the table property.

      All inserts to the target table going through Impala will have a sort by operator to locally sort the data according to the sort key columns specified in the table properties.

      This allows Impala to skip reading entire blocks of data for the sort column as Parquet writer will track the minimum and maximum column values stored on each block and can skip blocks that don't apply to the predicate range.

      CREATE TABLE customer (        
        c_customer_sk INT,                
        c_customer_id STRING,             
        c_current_cdemo_sk INT,           
        c_current_hdemo_sk INT,           
        c_current_addr_sk INT,            
        c_first_shipto_date_sk INT,       
        c_first_sales_date_sk INT,        
        c_salutation STRING,              
        c_first_name STRING,              
        c_last_name STRING,               
        c_preferred_cust_flag STRING,     
        c_birth_day INT,                  
        c_birth_month INT,                
        c_birth_year INT,                 
        c_birth_country STRING,           
        c_login STRING,                   
        c_email_address STRING,           
        c_last_review_date STRING
      )   
      TBLPROPERTIES ('sort_columns'='c_last_name,c_first_name');
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            mmokhtar Mostafa Mokhtar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: