Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.3.0
-
None
-
Linux
Java 1.8.0
Description
Only codec can be effectively configured via code, but "orc.compress.size" or "orc.row.index.stride" can not.
// try val spark = SparkSession .builder() .appName(appName) .enableHiveSupport() .config("spark.sql.orc.impl", "native") .config("orc.compress.size", 512 * 1024) .config("spark.sql.orc.compress.size", 512 * 1024) .config("hive.exec.orc.default.buffer.size", 512 * 1024) .config("spark.hadoop.io.file.buffer.size", 512 * 1024) .getOrCreate()
orcfiledump still shows:
File Version: 0.12 with FUTURE Compression: ZLIB Compression size: 65536
Executor Log:
impl.WriterImpl: ORC writer created for path: hdfs://name_node_host:9000/foo/bar/_temporary/0/_temporary/attempt_20190920222359_0001_m_000127_0/part-00127-2a9a9287-54bf-441c-b3cf-718b122d9c2f_00127.c000.zlib.orc with stripeSize: 67108864 blockSize: 268435456 compression: ZLIB bufferSize: 65536 File Output Committer Algorithm version is 2
According to SPARK-23342, the other ORC options should be configurable. Is there anything missing here?
Is there any other way to affect "orc.compress.size"?
Attachments
Issue Links
- is related to
-
SPARK-23342 Add ORC configuration tests for ORC data source
- Resolved