Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
-
None
Description
Preconditions
Create file ratings.csv and put it to HDFS /user/test/rating/ratings.csv.
userId,movieId,rating,timestamp 1,2,4.5,1784325658
See appropriate data.orc file in attachment.
STR:
1. Using spark (tested on version 2.2.1 and 2.3.1) created dataframe(df) of using interSchema from a CSV file
val df =spark.read.format("csv").option("header","true").option("inferSchema","true").load("/user/test/rating/ratings.csv")
2. Now save the df into ORC format file.
df.write.format("orc").save("/user/test/spark_rating_orc_typesafe")
3. Using hive 2.3. Try creating hive external table respective.
create external table rating_orc_hive_type_1(userId int,movieId int,rating double, `timestamp` int) stored as ORC location "/user/test/spark_orc_rating_typesafe/";
4. Do query
select * from rating_orc_hive_type_1;
Only double value is printed. Null for integer and even for BIGINT.
OK NULL NULL 4.5 1784325658
Attachments
Attachments
Issue Links
- duplicates
-
ORC-264 column name matching while schema evolution should be case unaware.
- Closed