Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4155

Table converted from text to parquet still creates partitions in text

    XMLWordPrintableJSON

Details

    Description

      I have some users who used Impala to convert all partitions and a the table itself to parquet. Impala still reports ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' after fileformat is set to parquet.

      [ip-172-31-44-64.us-west-2.compute.internal:21000] > create table part4 (c1 int) partitioned by (p1 string) row format delimited fields terminated by ',';
      Query: create table part4 (c1 int) partitioned by (p1 string) row format delimited fields terminated by ','
      
      Fetched 0 row(s) in 0.07s
      [ip-172-31-44-64.us-west-2.compute.internal:21000] > show create table part4;
      Query: show create table part4
      +---------------------------------------------------------------------------------------------+
      | result                                                                                      |
      +---------------------------------------------------------------------------------------------+
      | CREATE TABLE default.part4 (                                                                |
      |   c1 INT                                                                                    |
      | )                                                                                           |
      | PARTITIONED BY (                                                                            |
      |   p1 STRING                                                                                 |
      | )                                                                                           |
      | ROW FORMAT DELIMITED FIELDS TERMINATED BY ','                                               |
      | WITH SERDEPROPERTIES ('serialization.format'=',', 'field.delim'=',')                        |
      | STORED AS TEXTFILE                                                                          |
      | LOCATION 'hdfs://ip-172-31-44-63.us-west-2.compute.internal:8020/user/hive/warehouse/part4' |
      | TBLPROPERTIES ('transient_lastDdlTime'='1474291598')                                        |
      +---------------------------------------------------------------------------------------------+
      Fetched 1 row(s) in 2.96s
      [ip-172-31-44-64.us-west-2.compute.internal:21000] > alter table part4 set fileformat PARQUET;
      Query: alter table part4 set fileformat PARQUET
      [ip-172-31-44-64.us-west-2.compute.internal:21000] > show create table part4;
      Query: show create table part4
      +---------------------------------------------------------------------------------------------+
      | result                                                                                      |
      +---------------------------------------------------------------------------------------------+
      | CREATE TABLE default.part4 (                                                                |
      |   c1 INT                                                                                    |
      | )                                                                                           |
      | PARTITIONED BY (                                                                            |
      |   p1 STRING                                                                                 |
      | )                                                                                           |
      | ROW FORMAT DELIMITED FIELDS TERMINATED BY ','                                               |
      | WITH SERDEPROPERTIES ('serialization.format'=',', 'field.delim'=',')                        |
      | STORED AS PARQUET                                                                           |
      | LOCATION 'hdfs://ip-172-31-44-63.us-west-2.compute.internal:8020/user/hive/warehouse/part4' |
      | TBLPROPERTIES ('transient_lastDdlTime'='1474291612')                                        |
      +---------------------------------------------------------------------------------------------+
      Fetched 1 row(s) in 0.01s
      

      But worse, when you use dynamic partitioning to create a new partition, it creates the new partition as text:

      [ip-172-31-44-64.us-west-2.compute.internal:21000] > insert into part4 partition(p1) select 1, "v2" from part4 limit 1;                                                                 
      Query: insert into part4 partition(p1) select 1, "v2" from part4 limit 1
      Inserted 1 row(s) in 0.52s
      [ip-172-31-44-64.us-west-2.compute.internal:21000] > Goodbye centos
      [centos@ip-172-31-44-63 kudu]$ hadoop fs -ls -R /user/hive/warehouse/part4/
      drwxrwxrwt   - impala hive          0 2016-09-21 20:10 /user/hive/warehouse/part4/_impala_insert_staging
      drwxrwxrwt   - impala hive          0 2016-09-21 20:08 /user/hive/warehouse/part4/p1=v1
      -rw-r--r--   3 impala hive        246 2016-09-21 20:08 /user/hive/warehouse/part4/p1=v1/5146e1ec2606109c-e9160d3a568a9fbf_121715392_data.0.parq
      -rw-r--r--   3 impala hive        246 2016-09-21 20:07 /user/hive/warehouse/part4/p1=v1/ce48fed13925cb43-7d2925990599f2ae_726069818_data.0.parq
      drwxr-xr-x   - impala hive          0 2016-09-21 20:10 /user/hive/warehouse/part4/p1=v2
      -rw-r--r--   3 impala hive          2 2016-09-21 20:10 /user/hive/warehouse/part4/p1=v2/ab4a9ee6fbb020a0-1b173553935d7da0_488217750_data.0.
      

      Also you cannot select from the table:

      [ip-172-31-44-64.us-west-2.compute.internal:21000] > select * from part4;
      Query: select * from part4
      ERROR: 
      Parquet file hdfs://ip-172-31-44-63.us-west-2.compute.internal:8020/user/hive/warehouse/part4/p1=v2/ab4a9ee6fbb020a0-1b173553935d7da0_488217750_data.0. has an invalid file length: 2
      

      Attachments

        Activity

          People

            tarasbob Taras Bobrovytsky
            flumeqa Flume QA
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: