[SPARK-29195] Can't config orc.compress.size option for native ORC writer - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Incomplete
Affects Version/s: 2.3.0
Fix Version/s: None
Component/s: Spark Core
Labels:
- ORC
- bulk-closed
Environment:

Linux
Java 1.8.0

Description

Only codec can be effectively configured via code, but "orc.compress.size" or "orc.row.index.stride" can not.

// try
  val spark = SparkSession
    .builder()
    .appName(appName)
    .enableHiveSupport()
    .config("spark.sql.orc.impl", "native")
    .config("orc.compress.size", 512 * 1024)
    .config("spark.sql.orc.compress.size", 512 * 1024)
    .config("hive.exec.orc.default.buffer.size", 512 * 1024)
    .config("spark.hadoop.io.file.buffer.size", 512 * 1024)
    .getOrCreate()

orcfiledump still shows:

File Version: 0.12 with FUTURE

Compression: ZLIB
Compression size: 65536

Executor Log:

impl.WriterImpl: ORC writer created for path: hdfs://name_node_host:9000/foo/bar/_temporary/0/_temporary/attempt_20190920222359_0001_m_000127_0/part-00127-2a9a9287-54bf-441c-b3cf-718b122d9c2f_00127.c000.zlib.orc with stripeSize: 67108864 blockSize: 268435456 compression: ZLIB bufferSize: 65536

File Output Committer Algorithm version is 2

According to ~~SPARK-23342~~, the other ORC options should be configurable. Is there anything missing here?
Is there any other way to affect "orc.compress.size"?