Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4163

Introduce SORTBY plan hint for insert statements

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: Impala 2.2, Impala 2.3.0, Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0
    • Fix Version/s: None
    • Component/s: Frontend

      Description

      In order to improve compression and/or the effectiveness of min/max pruning, it is desirable to control the order in which rows are inserted into table (mostly for Parquet).

      To that end, we should introduce a "sortby" plan hint for insert statements: Example

      CREATE TABLE dst (...);
      INSERT INTO dst /*+ sortby(day,hour) */ SELECT * FROM src;
      

      This would produce the following plan:
      SCAN -> SORT(day,hour) -> TABLE SINK

      Syntax and behavior

       INSERT INTO dst /*+ sortby(day,hour) */ SELECT * FROM src; 
      • We will not support the legacy-hint style with brackets
        [sortby(day,hour)]
      • To keep the "clustered" hint strictly separate from the "sortby" hint, it is only legal to use non-partition columns in "sortby" for HDFS tables.
      • Similarly, it is only legal to mention non-primary-key columns of Kudu tables.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                lv Lars Volker
                Reporter:
                alex.behm Alexander Behm
              • Votes:
                1 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: