[SPARK-33978] Support ZSTD compression in ORC data source - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.2.0
Fix Version/s: 3.2.0
Component/s: SQL
Labels:
None

Description

What changes were proposed in this pull request?

This PR aims to support ZSTD compression in ORC data source.

Why are the changes needed?

Apache ORC 1.6 supports ZSTD compression to generate more compact files and save the storage cost.

BEFORE

scala> spark.range(10).write.option("compression", "zstd").orc("/tmp/zstd")
 java.lang.IllegalArgumentException: Codec [zstd] is not available. Available codecs are uncompressed, lzo, snappy, zlib, none.

AFTER

scala> spark.range(10).write.option("compression", "zstd").orc("/tmp/zstd")

 $ orc-tools meta /tmp/zstd 
 Processing data file file:/tmp/zstd/part-00011-a63d9a17-456f-42d3-87a1-d922112ed28c-c000.orc [length: 230]
 Structure for file:/tmp/zstd/part-00011-a63d9a17-456f-42d3-87a1-d922112ed28c-c000.orc
 File Version: 0.12 with ORC_14
 Rows: 1
 Compression: ZSTD
 Compression size: 262144
 Calendar: Julian/Gregorian
 Type: struct<id:bigint>
Stripe Statistics:
 Stripe 1:
 Column 0: count: 1 hasNull: false
 Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 9 max: 9 sum: 9
File Statistics:
 Column 0: count: 1 hasNull: false
 Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 9 max: 9 sum: 9
Stripes:
 Stripe: offset: 3 data: 6 rows: 1 tail: 35 index: 35
 Stream: column 0 section ROW_INDEX start: 3 length 11
 Stream: column 1 section ROW_INDEX start: 14 length 24
 Stream: column 1 section DATA start: 38 length 6
 Encoding column 0: DIRECT
 Encoding column 1: DIRECT_V2
File length: 230 bytes
 Padding length: 0 bytes
 Padding ratio: 0%
User Metadata:
 org.apache.spark.version=3.2.0

Attachments

Issue Links

links to

[Github] Pull Request #31002 (dongjoon-hyun)

Activity

People

Assignee:: Dongjoon Hyun

Reporter:: Dongjoon Hyun

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 04/Jan/21 06:29

Updated:: 06/Mar/21 23:59

Resolved:: 04/Jan/21 08:55