Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.3.0
-
None
-
None
Description
When we create a bucketed table as follows, it's input and output format are getting displayed as SequenceFile format. But physically the files are getting created in HDFS as the format specified by the user e.g. orc,parquet,etc.
df.write.format("orc").bucketBy(4,"order_status").saveAsTable("OrdersExample")
in Hive, DESCRIBE FORMATTED OrdersExample;
describe formatted ordersExample;
OK
- col_name data_type comment
col array<string> from deserializer
- Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Querying the same table in Hive is giving error.
select * from OrdersExample;
OK
Failed with exception java.io.IOException:java.io.IOException: hdfs://nn01.itversity.com:8020/apps/hive/warehouse/kuki.db/ordersexample/part-00000-55920574-eeb5-48b7-856d-e5c27e85ba12_00000.c000.snappy.orc not a SequenceFile
Attachments
Issue Links
- duplicates
-
SPARK-27592 Set the bucketed data source table SerDe correctly
- Resolved