Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6710

Docs around INSERT into partitioned tables are misleading

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.12.0
    • Fix Version/s: Impala 2.12.0
    • Component/s: Docs
    • Labels:
    • Epic Color:
      ghx-label-9

      Description

      Impala's INSERT statement has an optional "partition" clause where partition columns can be specified.

      This clause must be used for static partitioning, i.e. where the partition value is specified after the column:

      > insert into t1 partition(x=10, y='a') select c1 from some_other_table;
      

      But it is not required for dynamic partition, eg. the following inserts are equivalent:

      > create table test (c string) partitioned by (p int);
      > insert into foo (p, c) values (0, 'c');
      > insert into foo (c) partition(p) values ('c', 0);
      > insert into foo partition(p) values ('c', 0);
      

      and note:

      • the columns are inserted into in the order they appear in the SQL, hence the order of 'c' and 1 being flipped in the first two examples
      • when a partition clause is specified but the other columns are excluded, as in the third example, the other columns are treated as though they had all been specified before the partition clauses in the SQL

      Confusingly, though, the partition columns are required to be mentioned in the query in some form, eg:

      > insert into foo values ('c', 1);
      

      would be valid for a non-partitioned table, so long as it had a number and types of columns that match the values clause, but can never be valid for a partitioned table.

      The docs around this are not very clear:
      http://impala.apache.org/docs/build/html/topics/impala_insert.html
      and seem to indicate that partition columns must be specified in the "partition" clause, eg. the sentence:

      Inserting data into partitioned tables requires slightly different syntax that divides the partitioning columns from the others: 
      

      and the examples that follow it.

        Attachments

          Activity

            People

            • Assignee:
              arodoni_cloudera Alex Rodoni
              Reporter:
              twmarshall Thomas Tauber-Marshall
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: