Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29195

Can't config orc.compress.size option for native ORC writer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 2.3.0
    • None
    • Spark Core
    • Linux
      Java 1.8.0

    Description

       Only codec can be effectively configured via code, but "orc.compress.size" or "orc.row.index.stride" can not.

       

      // try
        val spark = SparkSession
          .builder()
          .appName(appName)
          .enableHiveSupport()
          .config("spark.sql.orc.impl", "native")
          .config("orc.compress.size", 512 * 1024)
          .config("spark.sql.orc.compress.size", 512 * 1024)
          .config("hive.exec.orc.default.buffer.size", 512 * 1024)
          .config("spark.hadoop.io.file.buffer.size", 512 * 1024)
          .getOrCreate()
      

      orcfiledump still shows:
       

      File Version: 0.12 with FUTURE
      
      Compression: ZLIB
      Compression size: 65536
      

       
      Executor Log:

      impl.WriterImpl: ORC writer created for path: hdfs://name_node_host:9000/foo/bar/_temporary/0/_temporary/attempt_20190920222359_0001_m_000127_0/part-00127-2a9a9287-54bf-441c-b3cf-718b122d9c2f_00127.c000.zlib.orc with stripeSize: 67108864 blockSize: 268435456 compression: ZLIB bufferSize: 65536
      
      File Output Committer Algorithm version is 2
      

      According to SPARK-23342, the other ORC options should be configurable. Is there anything missing here?
      Is there any other way to affect "orc.compress.size"?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ericsun2 Eric Sun
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: