Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25635

Support selective direct encoding in native ORC write

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 2.4.1, 3.0.0
    • SQL
    • None

    Description

      Before ORC 1.5.3, `orc.dictionary.key.threshold` and `hive.exec.orc.dictionary.key.size.threshold` is applied for all columns. This is a big huddle to enable dictionary encoding.

      From ORC 1.5.3, `orc.column.encoding.direct` is added to enforce direct encoding selectively in a column-wise manner. This issue aims to add that feature by upgrading ORC from 1.5.2 to 1.5.3.

      The followings are the patches in ORC 1.5.3 and this feature is the only one related to Spark directly.

      ORC-406: ORC: Char(n) and Varchar(n) writers truncate to n bytes & corrupts multi-byte data (gopalv)
      ORC-403: [C++] Add checks to avoid invalid offsets in InputStream
      ORC-405. Remove calcite as a dependency from the benchmarks.
      ORC-375: Fix libhdfs on gcc7 by adding #include <functional> two places.
      ORC-383: Parallel builds fails with ConcurrentModificationException
      ORC-382: Apache rat exclusions + add rat check to travis
      ORC-401: Fix incorrect quoting in specification.
      ORC-385. Change RecordReader to extend Closeable.
      ORC-384: [C++] fix memory leak when loading non-ORC files
      ORC-391: [c++] parseType does not accept underscore in the field name
      ORC-397. Allow selective disabling of dictionary encoding. Original patch was by Mithun Radhakrishnan.
      ORC-389: Add ability to not decode Acid metadata columns
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dongjoon Dongjoon Hyun
            dongjoon Dongjoon Hyun
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment