[PARQUET-723] parquet is not storing the type for the column. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: parquet-format
Labels:
None

Description

1. Create Text file format table
CREATE EXTERNAL TABLE IF NOT EXISTS emp(
id INT,
first_name STRING,
last_name STRING,
dateofBirth STRING,
join_date INT
)
COMMENT 'This is Employee Table Date Of Birth of type String'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/employee/beforePartition';

2. Load the data into table
load data inpath '/user/somupoc_timestamp/employeeData_partitioned.csv' into table emp;
select * from emp;

3. Create Partitioned table with file format as Parquet (dateofBirth STRING))

create external table emp_afterpartition(
id int, first_name STRING, last_name STRING, dateofBirth STRING)
COMMENT 'Employee partitioned table with dateofBirth of type string'
partitioned by (join_date int)
STORED as parquet
LOCATION '/user/employee/afterpartition';

4. Fetch the data from Partitioned column

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table emp_afterpartition partition (join_date) select * from emp;
select * from emp_afterpartition;
5. Create Partitioned table with file format as Parquet (dateofBirth TIMESTAMP))

CREATE EXTERNAL TABLE IF NOT EXISTS employee_afterpartition_timestamp_parq(
id INT,first_name STRING,last_name STRING,dateofBirth TIMESTAMP)
COMMENT 'employee partitioned table with dateofBirth of type TIMESTAMP'
PARTITIONED BY (join_date INT)
STORED AS PARQUET
LOCATION '/user/employee/afterpartition';

select * from employee_afterpartition_timestamp_parq;
– 0 records returned
impala :: alter table employee_afterpartition_timestamp_parq RECOVER PARTITIONS;
Hive :: MSCK REPAIR TABLE employee_afterpartition_timestamp_parq;
– MSCK works in Hive and RECOVER PARTITIONS works in Impala – metastore check command with the repair table option:

select * from employee_afterpartition_timestamp_parq;

Actual Result :: Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable

Expected Result :: Data should display

Note: if file format is text file instead of Parquet then I am able to fetch the data.
Observation : Two tables having different column type pointing to same location(HDFS ).

sample Data
=========

1,Joyce,Garza,2016-07-17 14:42:18,201607
2,Jerry,Ortiz,2016-08-17 21:36:54,201608
3,Steven,Ryan,2016-09-10 01:32:40,201609
4,Lisa,Black,2015-10-12 15:05:13,201610
5,Jose,Turner,2015-011-10 06:38:40,201611
6,Joyce,Garza,2016-08-02,201608
7,Jerry,Ortiz,2016-01-01,201601
8,Steven,Ryan,2016/08/20,201608
9,Lisa,Black,2016/09/12,201609
10,Jose,Turner,09/19/2016,201609
11,Jose,Turner,20160915,201609

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Narasimha

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 21/Sep/16 17:59

Updated:: 26/Oct/16 21:32

Resolved:: 26/Oct/16 21:32