[ORC-502] Hive ORC read INT, BIGINT as NULL for Data created by Spark - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

Preconditions

Create file ratings.csv and put it to HDFS /user/test/rating/ratings.csv.

userId,movieId,rating,timestamp
1,2,4.5,1784325658

See appropriate data.orc file in attachment.

STR:

1. Using spark (tested on version 2.2.1 and 2.3.1) created dataframe(df) of using interSchema from a CSV file

val df =spark.read.format("csv").option("header","true").option("inferSchema","true").load("/user/test/rating/ratings.csv")

2. Now save the df into ORC format file.

df.write.format("orc").save("/user/test/spark_rating_orc_typesafe")

3. Using hive 2.3. Try creating hive external table respective.

create external table rating_orc_hive_type_1(userId int,movieId int,rating double, `timestamp` int) stored as ORC location "/user/test/spark_orc_rating_typesafe/";

4. Do query

select * from rating_orc_hive_type_1;

Only double value is printed. Null for integer and even for BIGINT.

OK
NULL    NULL    4.5     1784325658

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

data.orc
20/May/19 11:28
0.5 kB
Oleksiy Sayankin

Issue Links

duplicates

ORC-264 column name matching while schema evolution should be case unaware.

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Oleksiy Sayankin

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 20/May/19 11:22

Updated:: 12/Jan/21 16:58

Resolved:: 12/Jan/21 16:58