Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29195

Can't config orc.compress.size option for native ORC writer

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 2.3.0
    • None
    • Spark Core
    • Linux
      Java 1.8.0

    Description

       Only codec can be effectively configured via code, but "orc.compress.size" or "orc.row.index.stride" can not.

       

      // try
        val spark = SparkSession
          .builder()
          .appName(appName)
          .enableHiveSupport()
          .config("spark.sql.orc.impl", "native")
          .config("orc.compress.size", 512 * 1024)
          .config("spark.sql.orc.compress.size", 512 * 1024)
          .config("hive.exec.orc.default.buffer.size", 512 * 1024)
          .config("spark.hadoop.io.file.buffer.size", 512 * 1024)
          .getOrCreate()
      

      orcfiledump still shows:
       

      File Version: 0.12 with FUTURE
      
      Compression: ZLIB
      Compression size: 65536
      

       
      Executor Log:

      impl.WriterImpl: ORC writer created for path: hdfs://name_node_host:9000/foo/bar/_temporary/0/_temporary/attempt_20190920222359_0001_m_000127_0/part-00127-2a9a9287-54bf-441c-b3cf-718b122d9c2f_00127.c000.zlib.orc with stripeSize: 67108864 blockSize: 268435456 compression: ZLIB bufferSize: 65536
      
      File Output Committer Algorithm version is 2
      

      According to SPARK-23342, the other ORC options should be configurable. Is there anything missing here?
      Is there any other way to affect "orc.compress.size"?

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            ericsun2 Eric Sun
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment