Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
Impala 2.12.0
-
ghx-label-9
Description
Impala's INSERT statement has an optional "partition" clause where partition columns can be specified.
This clause must be used for static partitioning, i.e. where the partition value is specified after the column:
> insert into t1 partition(x=10, y='a') select c1 from some_other_table;
But it is not required for dynamic partition, eg. the following inserts are equivalent:
> create table test (c string) partitioned by (p int); > insert into foo (p, c) values (0, 'c'); > insert into foo (c) partition(p) values ('c', 0); > insert into foo partition(p) values ('c', 0);
and note:
- the columns are inserted into in the order they appear in the SQL, hence the order of 'c' and 1 being flipped in the first two examples
- when a partition clause is specified but the other columns are excluded, as in the third example, the other columns are treated as though they had all been specified before the partition clauses in the SQL
Confusingly, though, the partition columns are required to be mentioned in the query in some form, eg:
> insert into foo values ('c', 1);
would be valid for a non-partitioned table, so long as it had a number and types of columns that match the values clause, but can never be valid for a partitioned table.
The docs around this are not very clear:
http://impala.apache.org/docs/build/html/topics/impala_insert.html
and seem to indicate that partition columns must be specified in the "partition" clause, eg. the sentence:
Inserting data into partitioned tables requires slightly different syntax that divides the partitioning columns from the others:
and the examples that follow it.