Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-46

Reserve CompressionKind values for LZ4 and ZSTD

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.0
    • Component/s: None
    • Labels:
      None

      Description

      We are going to start using ORC with LZ4 and ZSTD compression types in Presto (which has its own reader and soon its own writer). Registering these as official CompressionKind values now will avoid future compatibility issues.

        Activity

        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user electrum opened a pull request:

        https://github.com/apache/orc/pull/20

        ORC-46. Reserve CompressionKind values for LZ4 and ZSTD.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/electrum/orc compression

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/orc/pull/20.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #20



        Show
        githubbot ASF GitHub Bot added a comment - GitHub user electrum opened a pull request: https://github.com/apache/orc/pull/20 ORC-46 . Reserve CompressionKind values for LZ4 and ZSTD. You can merge this pull request into a Git repository by running: $ git pull https://github.com/electrum/orc compression Alternatively you can review and apply these changes as the patch at: https://github.com/apache/orc/pull/20.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user omalley commented on the pull request:

        https://github.com/apache/orc/pull/20#issuecomment-207034812

        This seems reasonable. Are you working on implementing the compression? I'd love to get the integration in so that others could experiment with them.

        Show
        githubbot ASF GitHub Bot added a comment - Github user omalley commented on the pull request: https://github.com/apache/orc/pull/20#issuecomment-207034812 This seems reasonable. Are you working on implementing the compression? I'd love to get the integration in so that others could experiment with them.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user electrum commented on the pull request:

        https://github.com/apache/orc/pull/20#issuecomment-207036863

        We will implement them in Presto using this compression code (zstd coming soon): https://github.com/airlift/aircompressor

        Show
        githubbot ASF GitHub Bot added a comment - Github user electrum commented on the pull request: https://github.com/apache/orc/pull/20#issuecomment-207036863 We will implement them in Presto using this compression code (zstd coming soon): https://github.com/airlift/aircompressor
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user omalley commented on the pull request:

        https://github.com/apache/orc/pull/20#issuecomment-207046601

        A couple of thoughts:

        • So Presto has a writer now? If so, we should extend the WriterVersion with a new id for it so that we can handle bugs that are specific to that implementation.
        • I assume you'll follow the same compression block structure that we used with zlib and snappy. (Three bytes of header followed by the untagged compressed bytes.) I'm very concerned that we don't break compatibility with the C++ and Java readers.
        • That is really cool that there is a pure Java LZO, Snappy, and LZ4 implementation that is Apache licensed. Does it have a release that is pushed to Maven central?
        Show
        githubbot ASF GitHub Bot added a comment - Github user omalley commented on the pull request: https://github.com/apache/orc/pull/20#issuecomment-207046601 A couple of thoughts: So Presto has a writer now? If so, we should extend the WriterVersion with a new id for it so that we can handle bugs that are specific to that implementation. I assume you'll follow the same compression block structure that we used with zlib and snappy. (Three bytes of header followed by the untagged compressed bytes.) I'm very concerned that we don't break compatibility with the C++ and Java readers. That is really cool that there is a pure Java LZO, Snappy, and LZ4 implementation that is Apache licensed. Does it have a release that is pushed to Maven central?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user martint commented on the pull request:

        https://github.com/apache/orc/pull/20#issuecomment-207187882

        > So Presto has a writer now? If so, we should extend the WriterVersion with a new id for it so that we can handle bugs that are specific to that implementation.

        We're going to start working on a writer soon.

        > I assume you'll follow the same compression block structure that we used with zlib and snappy. (Three bytes of header followed by the untagged compressed bytes.) I'm very concerned that we don't break compatibility with the C++ and Java readers.

        Yeah, most likely.

        > That is really cool that there is a pure Java LZO, Snappy, and LZ4 implementation that is Apache licensed. Does it have a release that is pushed to Maven central?

        http://search.maven.org/#artifactdetails%7Cio.airlift%7Caircompressor%7C0.3%7Cjar

        Show
        githubbot ASF GitHub Bot added a comment - Github user martint commented on the pull request: https://github.com/apache/orc/pull/20#issuecomment-207187882 > So Presto has a writer now? If so, we should extend the WriterVersion with a new id for it so that we can handle bugs that are specific to that implementation. We're going to start working on a writer soon. > I assume you'll follow the same compression block structure that we used with zlib and snappy. (Three bytes of header followed by the untagged compressed bytes.) I'm very concerned that we don't break compatibility with the C++ and Java readers. Yeah, most likely. > That is really cool that there is a pure Java LZO, Snappy, and LZ4 implementation that is Apache licensed. Does it have a release that is pushed to Maven central? http://search.maven.org/#artifactdetails%7Cio.airlift%7Caircompressor%7C0.3%7Cjar
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/orc/pull/20

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/orc/pull/20
        Hide
        owen.omalley Owen O'Malley added a comment -

        I just committed this. Thanks, David!

        Show
        owen.omalley Owen O'Malley added a comment - I just committed this. Thanks, David!
        Hide
        leftylev Lefty Leverenz added a comment -

        Doc note: Eventually the documentation should be updated.

        Now it just says: "If the ORC file writer selects a generic compression codec (zlib or snappy), every part of the ORC file except for the Postscript is compressed with that codec." And the Hive ORC documentation says: "The codec can be Snappy, Zlib, or none."

        LZO should also be added.

        Show
        leftylev Lefty Leverenz added a comment - Doc note: Eventually the documentation should be updated. Now it just says: "If the ORC file writer selects a generic compression codec (zlib or snappy), every part of the ORC file except for the Postscript is compressed with that codec." And the Hive ORC documentation says: "The codec can be Snappy, Zlib, or none." LZO should also be added. ORC docs – Compression Hive ORC doc – Compression

          People

          • Assignee:
            electrum David Phillips
            Reporter:
            electrum David Phillips
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development