Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-723

parquet is not storing the type for the column.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • parquet-format
    • None

    Description

      1. Create Text file format table
      CREATE EXTERNAL TABLE IF NOT EXISTS emp(
      id INT,
      first_name STRING,
      last_name STRING,
      dateofBirth STRING,
      join_date INT
      )
      COMMENT 'This is Employee Table Date Of Birth of type String'
      ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
      LINES TERMINATED BY '\n'
      STORED AS TEXTFILE
      LOCATION '/user/employee/beforePartition';

      2. Load the data into table
      load data inpath '/user/somupoc_timestamp/employeeData_partitioned.csv' into table emp;
      select * from emp;

      3. Create Partitioned table with file format as Parquet (dateofBirth STRING))

      create external table emp_afterpartition(
      id int, first_name STRING, last_name STRING, dateofBirth STRING)
      COMMENT 'Employee partitioned table with dateofBirth of type string'
      partitioned by (join_date int)
      STORED as parquet
      LOCATION '/user/employee/afterpartition';

      4. Fetch the data from Partitioned column

      set hive.exec.dynamic.partition=true;
      set hive.exec.dynamic.partition.mode=nonstrict;
      insert overwrite table emp_afterpartition partition (join_date) select * from emp;
      select * from emp_afterpartition;
      5. Create Partitioned table with file format as Parquet (dateofBirth TIMESTAMP))

      CREATE EXTERNAL TABLE IF NOT EXISTS employee_afterpartition_timestamp_parq(
      id INT,first_name STRING,last_name STRING,dateofBirth TIMESTAMP)
      COMMENT 'employee partitioned table with dateofBirth of type TIMESTAMP'
      PARTITIONED BY (join_date INT)
      STORED AS PARQUET
      LOCATION '/user/employee/afterpartition';

      select * from employee_afterpartition_timestamp_parq;
      – 0 records returned
      impala :: alter table employee_afterpartition_timestamp_parq RECOVER PARTITIONS;
      Hive :: MSCK REPAIR TABLE employee_afterpartition_timestamp_parq;
      – MSCK works in Hive and RECOVER PARTITIONS works in Impala – metastore check command with the repair table option:

      select * from employee_afterpartition_timestamp_parq;

      Actual Result :: Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable

      Expected Result :: Data should display

      Note: if file format is text file instead of Parquet then I am able to fetch the data.
      Observation : Two tables having different column type pointing to same location(HDFS ).

      sample Data
      =========

      1,Joyce,Garza,2016-07-17 14:42:18,201607
      2,Jerry,Ortiz,2016-08-17 21:36:54,201608
      3,Steven,Ryan,2016-09-10 01:32:40,201609
      4,Lisa,Black,2015-10-12 15:05:13,201610
      5,Jose,Turner,2015-011-10 06:38:40,201611
      6,Joyce,Garza,2016-08-02,201608
      7,Jerry,Ortiz,2016-01-01,201601
      8,Steven,Ryan,2016/08/20,201608
      9,Lisa,Black,2016/09/12,201609
      10,Jose,Turner,09/19/2016,201609
      11,Jose,Turner,20160915,201609

      Attachments

        Activity

          People

            Unassigned Unassigned
            narasimhasomu Narasimha
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: