Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.6.0
-
None
Description
I have some users who used Impala to convert all partitions and a the table itself to parquet. Impala still reports ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' after fileformat is set to parquet.
[ip-172-31-44-64.us-west-2.compute.internal:21000] > create table part4 (c1 int) partitioned by (p1 string) row format delimited fields terminated by ','; Query: create table part4 (c1 int) partitioned by (p1 string) row format delimited fields terminated by ',' Fetched 0 row(s) in 0.07s [ip-172-31-44-64.us-west-2.compute.internal:21000] > show create table part4; Query: show create table part4 +---------------------------------------------------------------------------------------------+ | result | +---------------------------------------------------------------------------------------------+ | CREATE TABLE default.part4 ( | | c1 INT | | ) | | PARTITIONED BY ( | | p1 STRING | | ) | | ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' | | WITH SERDEPROPERTIES ('serialization.format'=',', 'field.delim'=',') | | STORED AS TEXTFILE | | LOCATION 'hdfs://ip-172-31-44-63.us-west-2.compute.internal:8020/user/hive/warehouse/part4' | | TBLPROPERTIES ('transient_lastDdlTime'='1474291598') | +---------------------------------------------------------------------------------------------+ Fetched 1 row(s) in 2.96s [ip-172-31-44-64.us-west-2.compute.internal:21000] > alter table part4 set fileformat PARQUET; Query: alter table part4 set fileformat PARQUET [ip-172-31-44-64.us-west-2.compute.internal:21000] > show create table part4; Query: show create table part4 +---------------------------------------------------------------------------------------------+ | result | +---------------------------------------------------------------------------------------------+ | CREATE TABLE default.part4 ( | | c1 INT | | ) | | PARTITIONED BY ( | | p1 STRING | | ) | | ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' | | WITH SERDEPROPERTIES ('serialization.format'=',', 'field.delim'=',') | | STORED AS PARQUET | | LOCATION 'hdfs://ip-172-31-44-63.us-west-2.compute.internal:8020/user/hive/warehouse/part4' | | TBLPROPERTIES ('transient_lastDdlTime'='1474291612') | +---------------------------------------------------------------------------------------------+ Fetched 1 row(s) in 0.01s
But worse, when you use dynamic partitioning to create a new partition, it creates the new partition as text:
[ip-172-31-44-64.us-west-2.compute.internal:21000] > insert into part4 partition(p1) select 1, "v2" from part4 limit 1; Query: insert into part4 partition(p1) select 1, "v2" from part4 limit 1 Inserted 1 row(s) in 0.52s [ip-172-31-44-64.us-west-2.compute.internal:21000] > Goodbye centos [centos@ip-172-31-44-63 kudu]$ hadoop fs -ls -R /user/hive/warehouse/part4/ drwxrwxrwt - impala hive 0 2016-09-21 20:10 /user/hive/warehouse/part4/_impala_insert_staging drwxrwxrwt - impala hive 0 2016-09-21 20:08 /user/hive/warehouse/part4/p1=v1 -rw-r--r-- 3 impala hive 246 2016-09-21 20:08 /user/hive/warehouse/part4/p1=v1/5146e1ec2606109c-e9160d3a568a9fbf_121715392_data.0.parq -rw-r--r-- 3 impala hive 246 2016-09-21 20:07 /user/hive/warehouse/part4/p1=v1/ce48fed13925cb43-7d2925990599f2ae_726069818_data.0.parq drwxr-xr-x - impala hive 0 2016-09-21 20:10 /user/hive/warehouse/part4/p1=v2 -rw-r--r-- 3 impala hive 2 2016-09-21 20:10 /user/hive/warehouse/part4/p1=v2/ab4a9ee6fbb020a0-1b173553935d7da0_488217750_data.0.
Also you cannot select from the table:
[ip-172-31-44-64.us-west-2.compute.internal:21000] > select * from part4; Query: select * from part4 ERROR: Parquet file hdfs://ip-172-31-44-63.us-west-2.compute.internal:8020/user/hive/warehouse/part4/p1=v2/ab4a9ee6fbb020a0-1b173553935d7da0_488217750_data.0. has an invalid file length: 2