[SPARK-29234] bucketed table created by Spark SQL DataFrame is in SequenceFile format - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.3.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

When we create a bucketed table as follows, it's input and output format are getting displayed as SequenceFile format. But physically the files are getting created in HDFS as the format specified by the user e.g. orc,parquet,etc.

df.write.format("orc").bucketBy(4,"order_status").saveAsTable("OrdersExample")

in Hive, DESCRIBE FORMATTED OrdersExample;

describe formatted ordersExample;
OK

col_name data_type comment
col array<string> from deserializer

Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

Querying the same table in Hive is giving error.

select * from OrdersExample;
OK
Failed with exception java.io.IOException:java.io.IOException: hdfs://nn01.itversity.com:8020/apps/hive/warehouse/kuki.db/ordersexample/part-00000-55920574-eeb5-48b7-856d-e5c27e85ba12_00000.c000.snappy.orc not a SequenceFile

Attachments

Issue Links

duplicates

SPARK-27592 Set the bucketed data source table SerDe correctly

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Suchintak Patnaik

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 24/Sep/19 18:56

Updated:: 28/Oct/19 19:36

Resolved:: 25/Sep/19 00:13