Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.7.6
    • Component/s: c++
    • Labels:
      None

      Description

      There is no way to use compression with the C++ DataFileReader and C++ DataFileWriter, from what I can tell. Adding compression of the written blocks using boost streams is relatively straight forward and I can provide a patch if people are interested.

      However, there are a couple caveats:

      • the windows builds of boost don't currently include zlib support (required for compression) by default. You have to do extra work to get it.
      • I don't know if doing it that way is compatible with other avro implementations
      1. patch
        10 kB
        Daniel Russel
      2. AVRO-1414.patch
        10 kB
        Doug Cutting

        Activity

        Hide
        Hudson added a comment -

        SUCCESS: Integrated in AvroJava #418 (See https://builds.apache.org/job/AvroJava/418/)
        AVRO-1414. C++: Add support for deflate-compressed data files. Contributed by Daniel Russel. (cutting: rev 1556373)

        • /avro/trunk/CHANGES.txt
        • /avro/trunk/lang/c++/CMakeLists.txt
        • /avro/trunk/lang/c++/api/DataFile.hh
        • /avro/trunk/lang/c++/impl/DataFile.cc
        • /avro/trunk/lang/c++/test/DataFileTests.cc
        Show
        Hudson added a comment - SUCCESS: Integrated in AvroJava #418 (See https://builds.apache.org/job/AvroJava/418/ ) AVRO-1414 . C++: Add support for deflate-compressed data files. Contributed by Daniel Russel. (cutting: rev 1556373) /avro/trunk/CHANGES.txt /avro/trunk/lang/c++/CMakeLists.txt /avro/trunk/lang/c++/api/DataFile.hh /avro/trunk/lang/c++/impl/DataFile.cc /avro/trunk/lang/c++/test/DataFileTests.cc
        Hide
        Doug Cutting added a comment -

        I committed this. Thanks, Daniel!

        Show
        Doug Cutting added a comment - I committed this. Thanks, Daniel!
        Hide
        ASF subversion and git services added a comment -

        Commit 1556373 from Doug Cutting in branch 'avro/trunk'
        [ https://svn.apache.org/r1556373 ]

        AVRO-1414. C++: Add support for deflate-compressed data files. Contributed by Daniel Russel.

        Show
        ASF subversion and git services added a comment - Commit 1556373 from Doug Cutting in branch 'avro/trunk' [ https://svn.apache.org/r1556373 ] AVRO-1414 . C++: Add support for deflate-compressed data files. Contributed by Daniel Russel.
        Hide
        Daniel Russel added a comment -

        Thanks for the github hint, I couldn't find how to do that

        Show
        Daniel Russel added a comment - Thanks for the github hint, I couldn't find how to do that
        Hide
        Doug Cutting added a comment -

        This looks great. Tests pass for me and I can read the compressed file written by C++ using Java's command line tools. +1 I will commit this soon unless someone objects.

        FYI for others, the patch for a GitHub commit can be obtained by adding '.patch' to the url, e.g.:

        https://github.com/salilab/avrocpp/compare/d8afad009069f056168a6b10600fcf91a302b95a...compression.patch

        Show
        Doug Cutting added a comment - This looks great. Tests pass for me and I can read the compressed file written by C++ using Java's command line tools. +1 I will commit this soon unless someone objects. FYI for others, the patch for a GitHub commit can be obtained by adding '.patch' to the url, e.g.: https://github.com/salilab/avrocpp/compare/d8afad009069f056168a6b10600fcf91a302b95a...compression.patch
        Hide
        Daniel Russel added a comment -

        I updated my patch to

        • remove gzip
        • set the parameters for zlib so as to not include the compression headers with zip (the java impl the spec don't)
          Hopefully this will make it compatible with the java implementation. It may also may make sense to play with the window_bits or strategy (see http://www.boost.org/doc/libs/1_35_0/libs/iostreams/doc/classes/zlib.html), but the (unspecified) defaults seem to be used in the java code.

        The linked patch should now be clean on svn 1552153 <https://github.com/salilab/avrocpp/compare/d8afad009069f056168a6b10600fcf91a302b95a...compression>

        Show
        Daniel Russel added a comment - I updated my patch to remove gzip set the parameters for zlib so as to not include the compression headers with zip (the java impl the spec don't) Hopefully this will make it compatible with the java implementation. It may also may make sense to play with the window_bits or strategy (see http://www.boost.org/doc/libs/1_35_0/libs/iostreams/doc/classes/zlib.html ), but the (unspecified) defaults seem to be used in the java code. The linked patch should now be clean on svn 1552153 < https://github.com/salilab/avrocpp/compare/d8afad009069f056168a6b10600fcf91a302b95a...compression >
        Hide
        Doug Cutting added a comment -

        This patch didn't apply cleanly. Here's a version that does.

        I also changed it to call the ZLib codec "deflate" as the Avro spec suggests, however the files this generates are not readable by Java, indicating there's something wrong still.

        Lastly, this should perhaps not implement the non-standard "gzip" codec as no other implementation will be able to read it, no? The compression of GZip is the same as ZLib. Both use the deflate codec, but GZip has extra headers.

        Show
        Doug Cutting added a comment - This patch didn't apply cleanly. Here's a version that does. I also changed it to call the ZLib codec "deflate" as the Avro spec suggests, however the files this generates are not readable by Java, indicating there's something wrong still. Lastly, this should perhaps not implement the non-standard "gzip" codec as no other implementation will be able to read it, no? The compression of GZip is the same as ZLib. Both use the deflate codec, but GZip has extra headers.
        Hide
        Daniel Russel added a comment -

        You can find a patch for nice inspection at <https://github.com/salilab/avrocpp/compare/compression>. I don't see an easy way of downloading it from there though. So it is attached too.

        Show
        Daniel Russel added a comment - You can find a patch for nice inspection at < https://github.com/salilab/avrocpp/compare/compression >. I don't see an easy way of downloading it from there though. So it is attached too.
        Hide
        Doug Cutting added a comment -

        This would be great to have. We can test compatibility against other implementations by putting more compressed files in share/test/data and having unit tests that validate against those.

        Show
        Doug Cutting added a comment - This would be great to have. We can test compatibility against other implementations by putting more compressed files in share/test/data and having unit tests that validate against those.

          People

          • Assignee:
            Daniel Russel
            Reporter:
            Daniel Russel
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development