Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 1.0.2, 0.23.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Google release Zippy as an open source (APLv2) project called Snappy (http://code.google.com/p/snappy). This tracks integrating it into Hadoop.

      Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.

      1. HADOOP-7206-20120302.txt
        50 kB
        Vinod Kumar Vavilapalli
      2. HADOOP-7206-20120223.txt
        50 kB
        Vinod Kumar Vavilapalli
      3. HADOOP-7206new-c.patch
        52 kB
        Alejandro Abdelnur
      4. HADOOP-7206new-b.patch
        52 kB
        Alejandro Abdelnur
      5. HADOOP-7206new.patch
        45 kB
        Alejandro Abdelnur
      6. HADOOP-7206revertplusnew-b.patch
        51 kB
        Alejandro Abdelnur
      7. HADOOP-7206revertplusnew.patch
        49 kB
        Alejandro Abdelnur
      8. v5-HADOOP-7206-snappy-codec-using-snappy-java.txt
        18 kB
        T Jake Luciani
      9. v4-HADOOP-7206-snappy-codec-using-snappy-java.txt
        18 kB
        T Jake Luciani
      10. v3-HADOOP-7206-snappy-codec-using-snappy-java.txt
        19 kB
        T Jake Luciani
      11. v2-HADOOP-7206-snappy-codec-using-snappy-java.txt
        19 kB
        T Jake Luciani
      12. HADOOP-7206-002.patch
        38 kB
        issei yoshida
      13. HADOOP-7206.patch
        38 kB
        issei yoshida

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          Do you think we should integrate it into Hadoop proper, or start a separate project to implement it as a pluggable codec?

          The advantage of doing it as a separate project is that we could provide it for existing Hadoop versions without having to re-release Hadoop. Also allows for quicker bug fixes and granular upgrades when issues are identified.

          Show
          Todd Lipcon added a comment - Do you think we should integrate it into Hadoop proper, or start a separate project to implement it as a pluggable codec? The advantage of doing it as a separate project is that we could provide it for existing Hadoop versions without having to re-release Hadoop. Also allows for quicker bug fixes and granular upgrades when issues are identified.
          Hide
          Allen Wittenauer added a comment -

          The disadvantage is that it is yet another reason not to get an Apache release out. After all, if all the 'good bits' have been back ported, why move forward?

          So no, please don't break it out, please don't back port. Or is the goal to see if we can make it to three years before we get a new, solid release out the door?

          Show
          Allen Wittenauer added a comment - The disadvantage is that it is yet another reason not to get an Apache release out. After all, if all the 'good bits' have been back ported, why move forward? So no, please don't break it out, please don't back port. Or is the goal to see if we can make it to three years before we get a new, solid release out the door?
          Hide
          Todd Lipcon added a comment -

          Hey Allen. When there are real customers (either external customers in my case or internal customers at a lot of the bigger companies using Hadoop) who need a feature, it's not always possible to say no.

          Given the choice between putting it in core (and most likely having major users and distros backport) vs putting it in an external library, I prefer the external library. I don't follow your reasoning that putting it in core will speed up an Apache release – when has adding new code to a project ever made its release cycle faster?

          FWIW I have expressed this same opinion even on projects like HBase where releases are frequent and people run trees that are very close to the Apache bits. Keeping projects small makes releases easier, not harder, and frequent releases means fewer backports.

          Show
          Todd Lipcon added a comment - Hey Allen. When there are real customers (either external customers in my case or internal customers at a lot of the bigger companies using Hadoop) who need a feature, it's not always possible to say no. Given the choice between putting it in core (and most likely having major users and distros backport) vs putting it in an external library, I prefer the external library. I don't follow your reasoning that putting it in core will speed up an Apache release – when has adding new code to a project ever made its release cycle faster? FWIW I have expressed this same opinion even on projects like HBase where releases are frequent and people run trees that are very close to the Apache bits. Keeping projects small makes releases easier, not harder, and frequent releases means fewer backports.
          Hide
          Allen Wittenauer added a comment -

          I don't really care what Cloudera does as a business.

          But Apache's track history on making everything a separate project to release faster has failed. We'd also be better off if folks would resist the temptation to back port anything and everything.

          We need to draw the line, and it might as well start here.

          Show
          Allen Wittenauer added a comment - I don't really care what Cloudera does as a business. But Apache's track history on making everything a separate project to release faster has failed. We'd also be better off if folks would resist the temptation to back port anything and everything. We need to draw the line, and it might as well start here.
          Hide
          Aaron Kimball added a comment -

          +1 on modular separation. Forcing folks to upgrade an entire cluster to allow them to use a compression library in their client MR programs seems like poor design. Compression is pluggable in Hadoop for a reason.

          Show
          Aaron Kimball added a comment - +1 on modular separation. Forcing folks to upgrade an entire cluster to allow them to use a compression library in their client MR programs seems like poor design. Compression is pluggable in Hadoop for a reason.
          Hide
          Allen Wittenauer added a comment -

          I'm thinking beyond "just a compression library".

          Show
          Allen Wittenauer added a comment - I'm thinking beyond "just a compression library".
          Hide
          Benoit Sigoure added a comment -

          Allen, I agree that situation with back-porting everything in Hadoop has gotten to a ridiculous point. So while I understand and share your desire to stop back-porting things or splitting things out, you also have to understand the desire of users like us whose business greatly depends on Hadoop/HBase and where we need to move forward quickly. If we were to wait until Apache releases a version of Hadoop we can use in production with HBase (proper append, no data loss, etc), we'd still be waiting. So although I don't like the current situation with Hadoop either, I'm glad someone did the grungy work of back-porting things or splitting some things out so we could move forward.

          What Todd is proposing is simply a way to make Snappy available quickly to users like us, and we'd be very happy about that. I think it's in everyone's interest to make this available as soon as possible and not wait for a future Hadoop release.

          Note: we're not Cloudera customers, but we use CDH because It Just Works.

          Show
          Benoit Sigoure added a comment - Allen, I agree that situation with back-porting everything in Hadoop has gotten to a ridiculous point. So while I understand and share your desire to stop back-porting things or splitting things out, you also have to understand the desire of users like us whose business greatly depends on Hadoop/HBase and where we need to move forward quickly. If we were to wait until Apache releases a version of Hadoop we can use in production with HBase (proper append, no data loss, etc), we'd still be waiting. So although I don't like the current situation with Hadoop either, I'm glad someone did the grungy work of back-porting things or splitting some things out so we could move forward. What Todd is proposing is simply a way to make Snappy available quickly to users like us, and we'd be very happy about that. I think it's in everyone's interest to make this available as soon as possible and not wait for a future Hadoop release. Note: we're not Cloudera customers, but we use CDH because It Just Works.
          Hide
          issei yoshida added a comment -

          I start developing a external library of Snappy compression.
          http://code.google.com/p/hadoop-snappy/

          We can easily add this library to existing Hadoop cluster,
          while I also hope to integrate it to Hadoop core.

          I would like to hear your advice to improve the library.

          Show
          issei yoshida added a comment - I start developing a external library of Snappy compression. http://code.google.com/p/hadoop-snappy/ We can easily add this library to existing Hadoop cluster, while I also hope to integrate it to Hadoop core. I would like to hear your advice to improve the library.
          Hide
          Taro L. Saito added a comment -

          Hi all,

          I haven't noticed the discussion in this thread, but
          I recently developed snappy-java, a java-port of Google's snappy:
          http://code.google.com/p/snappy-java/

          Snappy-java is based on JNI, but designed to be portable across various platforms (Win/Mac/Linux 32/64-bit CPUs).
          It might be useful to quickly test snappy in Java.

          Show
          Taro L. Saito added a comment - Hi all, I haven't noticed the discussion in this thread, but I recently developed snappy-java, a java-port of Google's snappy: http://code.google.com/p/snappy-java/ Snappy-java is based on JNI, but designed to be portable across various platforms (Win/Mac/Linux 32/64-bit CPUs). It might be useful to quickly test snappy in Java.
          Hide
          Todd Lipcon added a comment -

          Hi Issei. Thanks for starting the work to port Snappy to a compression codec! The only concern I have with your codebase is that it seems to be derived from the hadoop-lzo codebase, which has portions of the Java code licensed under the GPL.

          As I understand it, so long as all of the copyright owners (ie authors) of the LZO codebase agree, we can relicense the Java portions of that code as Apache or BSD. I believe the main author list is:

          • me
          • Kevin Weil
          • Owen O'Malley
          • Hong Tang
          • Chris Douglas

          Do others agree with this assessment?

          Show
          Todd Lipcon added a comment - Hi Issei. Thanks for starting the work to port Snappy to a compression codec! The only concern I have with your codebase is that it seems to be derived from the hadoop-lzo codebase, which has portions of the Java code licensed under the GPL. As I understand it, so long as all of the copyright owners (ie authors) of the LZO codebase agree, we can relicense the Java portions of that code as Apache or BSD. I believe the main author list is: me Kevin Weil Owen O'Malley Hong Tang Chris Douglas Do others agree with this assessment?
          Hide
          Chris Douglas added a comment -

          Arun was the original author of that module.

          Show
          Chris Douglas added a comment - Arun was the original author of that module.
          Hide
          Hong Tang added a comment -

          I have no objection of relicensing the java portion of the code as Apache or BSD.

          Show
          Hong Tang added a comment - I have no objection of relicensing the java portion of the code as Apache or BSD.
          Hide
          issei yoshida added a comment -

          Hi Todd. Thank you for telling me the license problem.
          I change the license to GPL until all authors agree.

          Show
          issei yoshida added a comment - Hi Todd. Thank you for telling me the license problem. I change the license to GPL until all authors agree.
          Hide
          Todd Lipcon added a comment -

          I corresponded with all of the above people (Kevin, Owen, Hong, Chris, Arun) and all are +1 on re-licensing the Java portions to Apache 2 license.

          Show
          Todd Lipcon added a comment - I corresponded with all of the above people (Kevin, Owen, Hong, Chris, Arun) and all are +1 on re-licensing the Java portions to Apache 2 license.
          Hide
          issei yoshida added a comment -

          Todd, Thanks for corresponding with authors.
          I'm looking forward to their reply and changing the license to Apache 2.

          Show
          issei yoshida added a comment - Todd, Thanks for corresponding with authors. I'm looking forward to their reply and changing the license to Apache 2.
          Hide
          issei yoshida added a comment -

          Hi all,

          Thank you for giving permission to relisence regarding hadoop-snappy.

          I created the patch that add the snappy compression support to the trunk of the Hadoop SVN repository.

          Its native library is a part of the libhadoop and it requires the snappy v1.0.2 or higher.
          (Because it doesn't use C++ interface but C interface and load the snappy native library dynamically via dlopen.)

          I would like to hear your advice.

          Show
          issei yoshida added a comment - Hi all, Thank you for giving permission to relisence regarding hadoop-snappy. I created the patch that add the snappy compression support to the trunk of the Hadoop SVN repository. Its native library is a part of the libhadoop and it requires the snappy v1.0.2 or higher. (Because it doesn't use C++ interface but C interface and load the snappy native library dynamically via dlopen.) I would like to hear your advice.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12479073/HADOOP-7206.patch
          against trunk revision 1102123.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.io.compress.TestCodec

          +1 contrib tests. The patch passed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/444//testReport/
          Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/444//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/444//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12479073/HADOOP-7206.patch against trunk revision 1102123. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.io.compress.TestCodec +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/444//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/444//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/444//console This message is automatically generated.
          Hide
          issei yoshida added a comment -

          As a result of talking with Alejandro and Tom,
          we agree that we'll give up the integration into Hadoop core and cancel Hadoop-7206,
          because to avoid confusion with users/developers, the code should be in one place either a separate project or integrated into hadoop core.
          The hadoop-snappy as an independent project have the flexibility of doing releases and bug fixes independent of Hadoop releases,
          so I think I'll continue to develop hadoop-snappy only.

          Thanks.

          Show
          issei yoshida added a comment - As a result of talking with Alejandro and Tom, we agree that we'll give up the integration into Hadoop core and cancel Hadoop-7206, because to avoid confusion with users/developers, the code should be in one place either a separate project or integrated into hadoop core. The hadoop-snappy as an independent project have the flexibility of doing releases and bug fixes independent of Hadoop releases, so I think I'll continue to develop hadoop-snappy only. Thanks.
          Hide
          Tom White added a comment -

          I actually think that, long-term, Snappy compression belongs in Hadoop, along with the other compression codecs. The hadoop-snappy project is a useful stopgap until we get regular releases going again, and it allows other projects like HBase to use Snappy in the meantime.

          Show
          Tom White added a comment - I actually think that, long-term, Snappy compression belongs in Hadoop, along with the other compression codecs. The hadoop-snappy project is a useful stopgap until we get regular releases going again, and it allows other projects like HBase to use Snappy in the meantime.
          Hide
          T Jake Luciani added a comment -

          We have a Snappy codec in brisk based on the java-snappy project: http://code.google.com/p/snappy-java/

          We did this because snappy-java is ASL, in maven, and comes with pre-built with shared libs for win, linux, and osx. For end users this is a much lower bar to getting into their system.

          https://github.com/riptano/brisk/tree/master/src/java/src/com/hadoop/compression/snappy

          I'm confused on how to contribue this back based on the above comments should we submit a patch or release this as a standalone github project?

          Show
          T Jake Luciani added a comment - We have a Snappy codec in brisk based on the java-snappy project: http://code.google.com/p/snappy-java/ We did this because snappy-java is ASL, in maven, and comes with pre-built with shared libs for win, linux, and osx. For end users this is a much lower bar to getting into their system. https://github.com/riptano/brisk/tree/master/src/java/src/com/hadoop/compression/snappy I'm confused on how to contribue this back based on the above comments should we submit a patch or release this as a standalone github project?
          Hide
          Alejandro Abdelnur added a comment -

          Jake,

          1. Snappy-Java bundles the native libraries in the JAR itself. While that is convenient/clever packaging technique, this is different from how Hadoop handles native libraries (loading them from lib/native/$

          {OS_ARCH}

          /).

          2. The motivation for keeping hadoop-snappy independent of hadoop was that we could use it right the way in other projects (HBase already integrated it).

          I would strongly argue that native libraries should handled in a consistent maner in Hadoop.

          And, if the preference of the Hadoop folks is to bundle snappy in Hadoop (dismissing #2), then I'd advocate for bringing Hadoop-Snappy into Hadoop as this JIRA originally proposed. By doing this we would have 1 external dependency (snappy) instead 2 (snappy-java and snappy, with the side effect that if we need a new version of snappy we would have to wait for snappy-java to do a release with it).

          Thoughts?

          Show
          Alejandro Abdelnur added a comment - Jake, 1. Snappy-Java bundles the native libraries in the JAR itself. While that is convenient/clever packaging technique, this is different from how Hadoop handles native libraries (loading them from lib/native/$ {OS_ARCH} /). 2. The motivation for keeping hadoop-snappy independent of hadoop was that we could use it right the way in other projects (HBase already integrated it). I would strongly argue that native libraries should handled in a consistent maner in Hadoop. And, if the preference of the Hadoop folks is to bundle snappy in Hadoop (dismissing #2), then I'd advocate for bringing Hadoop-Snappy into Hadoop as this JIRA originally proposed. By doing this we would have 1 external dependency (snappy) instead 2 (snappy-java and snappy, with the side effect that if we need a new version of snappy we would have to wait for snappy-java to do a release with it). Thoughts?
          Hide
          Alejandro Abdelnur added a comment -

          Jake,

          1. Snappy-Java bundles the native libraries in the JAR itself. While that is convenient/clever packaging technique, this is different from how Hadoop handles native libraries (loading them from lib/native/$

          {OS_ARCH}

          /).

          2. The motivation for keeping hadoop-snappy independent of hadoop was that we could use it right the way in other projects (HBase already integrated it).

          I would strongly argue that native libraries should handled in a consistent maner in Hadoop.

          And, if the preference of the Hadoop folks is to bundle snappy in Hadoop (dismissing #2), then I'd advocate for bringing Hadoop-Snappy into Hadoop as this JIRA originally proposed. By doing this we would have 1 external dependency (snappy) instead 2 (snappy-java and snappy, with the side effect that if we need a new version of snappy we would have to wait for snappy-java to do a release with it).

          Thoughts?

          Show
          Alejandro Abdelnur added a comment - Jake, 1. Snappy-Java bundles the native libraries in the JAR itself. While that is convenient/clever packaging technique, this is different from how Hadoop handles native libraries (loading them from lib/native/$ {OS_ARCH} /). 2. The motivation for keeping hadoop-snappy independent of hadoop was that we could use it right the way in other projects (HBase already integrated it). I would strongly argue that native libraries should handled in a consistent maner in Hadoop. And, if the preference of the Hadoop folks is to bundle snappy in Hadoop (dismissing #2), then I'd advocate for bringing Hadoop-Snappy into Hadoop as this JIRA originally proposed. By doing this we would have 1 external dependency (snappy) instead 2 (snappy-java and snappy, with the side effect that if we need a new version of snappy we would have to wait for snappy-java to do a release with it). Thoughts?
          Hide
          T Jake Luciani added a comment -

          I agree with 1) this should be in the source tree... however, if 2) is the stopgap, then it should be as easy as possible for folks to use these plugins till then (hence the choice of snappy-java).

          IMO the apache builds should come with native libraries for common platforms as well as packages. I'd be happy to see hadoop-snappy in the src but without proper distribution it's a barrier to entry for using native libraries.

          Show
          T Jake Luciani added a comment - I agree with 1) this should be in the source tree... however, if 2) is the stopgap, then it should be as easy as possible for folks to use these plugins till then (hence the choice of snappy-java). IMO the apache builds should come with native libraries for common platforms as well as packages. I'd be happy to see hadoop-snappy in the src but without proper distribution it's a barrier to entry for using native libraries.
          Hide
          Alejandro Abdelnur added a comment -

          Following up on this, Issay will update his original patch that bring hadoop-snappy into hadoop-common.

          Show
          Alejandro Abdelnur added a comment - Following up on this, Issay will update his original patch that bring hadoop-snappy into hadoop-common.
          Hide
          issei yoshida added a comment -

          update for the latest Hadoop trunk
          and change the compression overhead.

          Show
          issei yoshida added a comment - update for the latest Hadoop trunk and change the compression overhead.
          Hide
          issei yoshida added a comment -

          I updated the patch for the latest Hadoop trunk.

          I would like to hear your advice.

          Show
          issei yoshida added a comment - I updated the patch for the latest Hadoop trunk. I would like to hear your advice.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12482633/HADOOP-7206-002.patch
          against trunk revision 1135820.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.io.compress.TestCodec

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/634//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/634//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/634//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482633/HADOOP-7206-002.patch against trunk revision 1135820. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.io.compress.TestCodec +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/634//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/634//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/634//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12482633/HADOOP-7206-002.patch
          against trunk revision 1135820.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.io.compress.TestCodec

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/638//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/638//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/638//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482633/HADOOP-7206-002.patch against trunk revision 1135820. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.io.compress.TestCodec +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/638//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/638//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/638//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12482633/HADOOP-7206-002.patch
          against trunk revision 1135820.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.io.compress.TestCodec

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/640//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/640//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/640//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482633/HADOOP-7206-002.patch against trunk revision 1135820. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.io.compress.TestCodec +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/640//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/640//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/640//console This message is automatically generated.
          Hide
          Alejandro Abdelnur added a comment -

          The test is failing because snappy SO is not installed/available in the build machine.

          Options that I see for this are:

          1* Install snappy in build machines
          2* Drag into hadoop native code snappy code

          While #2 is simpler it means we are forking snappy for Hadoop usage. We could do this until snappy is available as a package (RPM/DEB).

          Thoughts?

          Show
          Alejandro Abdelnur added a comment - The test is failing because snappy SO is not installed/available in the build machine. Options that I see for this are: 1* Install snappy in build machines 2* Drag into hadoop native code snappy code While #2 is simpler it means we are forking snappy for Hadoop usage. We could do this until snappy is available as a package (RPM/DEB). Thoughts?
          Hide
          T Jake Luciani added a comment -

          this is why the snappy-java approach works well IMO.

          Show
          T Jake Luciani added a comment - this is why the snappy-java approach works well IMO.
          Hide
          issei yoshida added a comment -

          Alejandro,

          How do we drag Snappy code into Hadoop trunk?
          May we drag the Snappy svn repository of the google code as svn externals?
          I think that it'd make Hadoop trunk more complex.

          IMO, #1 is simpler approarch.

          Show
          issei yoshida added a comment - Alejandro, How do we drag Snappy code into Hadoop trunk? May we drag the Snappy svn repository of the google code as svn externals? I think that it'd make Hadoop trunk more complex. IMO, #1 is simpler approarch.
          Hide
          Doug Cutting added a comment -

          What are the downsides of building on snappy-java? It seems to be updated frequently, is published in Maven, and would avoid loader conflicts with other Java projects that use snappy-java.

          Show
          Doug Cutting added a comment - What are the downsides of building on snappy-java? It seems to be updated frequently, is published in Maven, and would avoid loader conflicts with other Java projects that use snappy-java.
          Hide
          T Jake Luciani added a comment -

          I've attached the snappy-java version of this codec for consideration

          Show
          T Jake Luciani added a comment - I've attached the snappy-java version of this codec for consideration
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12482838/v1-0001-HADOOP-7206-snappy-codec-using-snappy-java.txt
          against trunk revision 1136249.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/643//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482838/v1-0001-HADOOP-7206-snappy-codec-using-snappy-java.txt against trunk revision 1136249. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/643//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12482850/v2-HADOOP-7206-snappy-codec-using-snappy-java.txt
          against trunk revision 1136249.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/644//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/644//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/644//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482850/v2-HADOOP-7206-snappy-codec-using-snappy-java.txt against trunk revision 1136249. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/644//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/644//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/644//console This message is automatically generated.
          Hide
          Alejandro Abdelnur added a comment -

          For both patches, the codec should be defined in the codecs property in commons-default.xml

          Show
          Alejandro Abdelnur added a comment - For both patches, the codec should be defined in the codecs property in commons-default.xml
          Hide
          T Jake Luciani added a comment -

          v3 adds SnappyCodec to core-default.xml

          Show
          T Jake Luciani added a comment - v3 adds SnappyCodec to core-default.xml
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12482888/v3-HADOOP-7206-snappy-codec-using-snappy-java.txt
          against trunk revision 1136249.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/646//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/646//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/646//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482888/v3-HADOOP-7206-snappy-codec-using-snappy-java.txt against trunk revision 1136249. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/646//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/646//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/646//console This message is automatically generated.
          Hide
          Alejandro Abdelnur added a comment -

          Jake, I've tested your patch and it works.

          A few of comments on the patch:

          • the change in build.xml seems unnecessary.
          • SnappyCompressor.compress() method has a log INFO, it should be removed or it should be a log DEBUG within an if DEBUG_ENABLED block, and what is prints seems wrong (the reported compressed size is always bigger than the outBuf)
          Show
          Alejandro Abdelnur added a comment - Jake, I've tested your patch and it works. A few of comments on the patch: the change in build.xml seems unnecessary. SnappyCompressor.compress() method has a log INFO, it should be removed or it should be a log DEBUG within an if DEBUG_ENABLED block, and what is prints seems wrong (the reported compressed size is always bigger than the outBuf)
          Hide
          Alejandro Abdelnur added a comment -

          Extra comment: indentation of java code should be 2 spaces, not 4

          Show
          Alejandro Abdelnur added a comment - Extra comment: indentation of java code should be 2 spaces, not 4
          Hide
          T Jake Luciani added a comment -

          Thanks for the feedback. Incorporated those changes into v4

          Show
          T Jake Luciani added a comment - Thanks for the feedback. Incorporated those changes into v4
          Hide
          Tom White added a comment -

          I noticed that the compression overhead in this patch is (bufferSize >> 3) + 128 + 3 which is less than the maximum possible blowup that Snappy allows for (http://code.google.com/p/snappy/source/browse/trunk/snappy.cc#55). Should this be changed to bufferSize / 6 + 32?

          Show
          Tom White added a comment - I noticed that the compression overhead in this patch is (bufferSize >> 3) + 128 + 3 which is less than the maximum possible blowup that Snappy allows for ( http://code.google.com/p/snappy/source/browse/trunk/snappy.cc#55 ). Should this be changed to bufferSize / 6 + 32 ?
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12483020/v4-HADOOP-7206-snappy-codec-using-snappy-java.txt
          against trunk revision 1137065.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/652//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/652//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/652//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12483020/v4-HADOOP-7206-snappy-codec-using-snappy-java.txt against trunk revision 1137065. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/652//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/652//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/652//console This message is automatically generated.
          Hide
          T Jake Luciani added a comment -

          Hi Tom. Good find. Actually snappy-java exposes this call so perhaps it should change to

          Snappy.maxCompressedLength(bufferSize) - bufferSize;
          

          Agree?

          Show
          T Jake Luciani added a comment - Hi Tom. Good find. Actually snappy-java exposes this call so perhaps it should change to Snappy.maxCompressedLength(bufferSize) - bufferSize; Agree?
          Hide
          T Jake Luciani added a comment -

          v5 includes mentioned change

          Show
          T Jake Luciani added a comment - v5 includes mentioned change
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12483021/v5-HADOOP-7206-snappy-codec-using-snappy-java.txt
          against trunk revision 1137065.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/653//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/653//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/653//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12483021/v5-HADOOP-7206-snappy-codec-using-snappy-java.txt against trunk revision 1137065. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/653//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/653//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/653//console This message is automatically generated.
          Hide
          Tom White added a comment -

          I've just committed this. Thanks T Jake and Issei.

          Show
          Tom White added a comment - I've just committed this. Thanks T Jake and Issei.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          HDFS cannot be compiled; see HADOOP-7407. How about we revert the patch?

          [ivy:resolve]           ::::::::::::::::::::::::::::::::::::::::::::::
          [ivy:resolve]           ::          UNRESOLVED DEPENDENCIES         ::
          [ivy:resolve]           ::::::::::::::::::::::::::::::::::::::::::::::
          [ivy:resolve]           :: org.xerial.snappy#java-snappy;1.0.3-rc2: not found
          [ivy:resolve]           ::::::::::::::::::::::::::::::::::::::::::::::
          
          Show
          Tsz Wo Nicholas Sze added a comment - HDFS cannot be compiled; see HADOOP-7407 . How about we revert the patch? [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] :: UNRESOLVED DEPENDENCIES :: [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] :: org.xerial.snappy#java-snappy;1.0.3-rc2: not found [ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
          Hide
          Tsz Wo Nicholas Sze added a comment -

          BTW, who has reviewed the patch? It seems there is no +1 on the patch.

          Show
          Tsz Wo Nicholas Sze added a comment - BTW, who has reviewed the patch? It seems there is no +1 on the patch.
          Hide
          T Jake Luciani added a comment -

          Can we fix #7407 rather than revert this one? looks like a typo.

          Show
          T Jake Luciani added a comment - Can we fix #7407 rather than revert this one? looks like a typo.
          Hide
          Tom White added a comment -

          I've committed the fix to HADOOP-7407 after manually verifying it.

          > BTW, who has reviewed the patch?

          I reviewed the patch. I marked it as "Reviewed" when committing it.

          Show
          Tom White added a comment - I've committed the fix to HADOOP-7407 after manually verifying it. > BTW, who has reviewed the patch? I reviewed the patch. I marked it as "Reviewed" when committing it.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #661 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/661/)

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #661 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/661/ )
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > Can we fix #7407 rather than revert this one? looks like a typo.

          Sure. Thanks.

          > I reviewed the patch. I marked it as "Reviewed" when committing it.

          Hi Tom, the patch is missing javadoc for the new public classes/methods. That was also a reason that I suggested reverting it.

          Show
          Tsz Wo Nicholas Sze added a comment - > Can we fix #7407 rather than revert this one? looks like a typo. Sure. Thanks. > I reviewed the patch. I marked it as "Reviewed" when committing it. Hi Tom, the patch is missing javadoc for the new public classes/methods. That was also a reason that I suggested reverting it.
          Hide
          Tom White added a comment -

          Nicholas, I agree that javadoc is needed. Thanks for pointing it out.

          T Jake, would you like to create a new patch which adds javadoc? I think a new JIRA would be fine.

          Show
          Tom White added a comment - Nicholas, I agree that javadoc is needed. Thanks for pointing it out. T Jake, would you like to create a new patch which adds javadoc? I think a new JIRA would be fine.
          Hide
          T Jake Luciani added a comment -

          Sure. Added in HADOOP-7408

          Show
          T Jake Luciani added a comment - Sure. Added in HADOOP-7408
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Let's fix the javadoc in HADOOP-7408. Thanks T Jake and Tom.

          Show
          Tsz Wo Nicholas Sze added a comment - Let's fix the javadoc in HADOOP-7408 . Thanks T Jake and Tom.
          Hide
          Roman Shaposhnik added a comment -

          I'd be curious to find out whether the lack of Solaris binaries in the /org/xerial/snappy/native/ bothers anybody:
          http://maven.xerial.org/repository/artifact/org/xerial/snappy/snappy-java/1.0.3-rc3/snappy-java-1.0.3-rc3.jar

          Show
          Roman Shaposhnik added a comment - I'd be curious to find out whether the lack of Solaris binaries in the /org/xerial/snappy/native/ bothers anybody: http://maven.xerial.org/repository/artifact/org/xerial/snappy/snappy-java/1.0.3-rc3/snappy-java-1.0.3-rc3.jar
          Hide
          Allen Wittenauer added a comment -

          With the current state of trunk, native compression only works on Linux anyway without reworking the entire libhadoop structure. See HADOOP-7405.

          Show
          Allen Wittenauer added a comment - With the current state of trunk, native compression only works on Linux anyway without reworking the entire libhadoop structure. See HADOOP-7405 .
          Hide
          Todd Lipcon added a comment -

          Sorry, I stopped paying attention to this for a while... I have some concerns about the way this ended up:

          We're now pulling in a jar which autoexpands its .so dependency into /tmp and then loads native libraries that way. That's (a) messy, (b) potentially insecure without workarounds to change /tmp to some other dir, and (c) inconsistent with how native libraries work. These are the same arguments Alejandro made above

          This maven artifact that we now depend on is something that isn't easy to rebuild, and it's not even clear how it gets build. For example, which glibc version is it linked against? Which OSX version is the included dylib built on? Seems a little scary as a dependency

          It seems the motivation to switch from the hadoop-snappy approach to the java-snappy approach was that the former approach depended on having snappy.so available at runtime, which isn't always the case. I would propose the following:

          • at build time, you can choose (a) disable snappy, (b) enable snappy and dynamically link our JNI shims against snappy.so, or (c) enable snappy and statically link against snappy.so
          • those who don't care about snappy choose (a)
          • those who care about snappy and plan to deploy on systems where libsnappy.so is deployed system-wide (eg fedora or most recent ubuntu) can choose (b) to pick up the snappy lib off the system
          • those who care about snappy and plan to deploy elsewhere choose (c), and just make sure that snappy is available at compile time

          Then the hadoopsnappy.so can be included in lib/native just like our other native dependencies without the unjar-to-tmp hackery.

          Does this idea address everyone's goals?

          Show
          Todd Lipcon added a comment - Sorry, I stopped paying attention to this for a while... I have some concerns about the way this ended up: We're now pulling in a jar which autoexpands its .so dependency into /tmp and then loads native libraries that way. That's (a) messy, (b) potentially insecure without workarounds to change /tmp to some other dir, and (c) inconsistent with how native libraries work. These are the same arguments Alejandro made above This maven artifact that we now depend on is something that isn't easy to rebuild, and it's not even clear how it gets build. For example, which glibc version is it linked against? Which OSX version is the included dylib built on? Seems a little scary as a dependency It seems the motivation to switch from the hadoop-snappy approach to the java-snappy approach was that the former approach depended on having snappy.so available at runtime, which isn't always the case. I would propose the following: at build time, you can choose (a) disable snappy, (b) enable snappy and dynamically link our JNI shims against snappy.so, or (c) enable snappy and statically link against snappy.so those who don't care about snappy choose (a) those who care about snappy and plan to deploy on systems where libsnappy.so is deployed system-wide (eg fedora or most recent ubuntu) can choose (b) to pick up the snappy lib off the system those who care about snappy and plan to deploy elsewhere choose (c), and just make sure that snappy is available at compile time Then the hadoopsnappy.so can be included in lib/native just like our other native dependencies without the unjar-to-tmp hackery. Does this idea address everyone's goals?
          Hide
          Eli Collins added a comment -

          I think so. Seems like we could get away with (a) and (b) for now. Ie those who care about snappy and plan to deploy elsewhere could reasonably use dynamic linking as well.

          Show
          Eli Collins added a comment - I think so. Seems like we could get away with (a) and (b) for now. Ie those who care about snappy and plan to deploy elsewhere could reasonably use dynamic linking as well.
          Hide
          Taro L. Saito added a comment -

          Let me clarify some differences between Issay's hadoop-snappy and my snappy-java:

          hadoop-snappy

          • Uses libsnappy.so (available in recent Linux distributions) and libhadoopsnappy.so (JNI code compiled for the target platform)

          snappy-java

          • Uses libsnappyjava.so (mixing up the original snappy and JNI code), or snappyjava.dll (for Windows), libsnappyjava.jnilib (for Mac OS X)
          • It copies one of the native library to the directory specified in org.xerial.snappy.tempdir or java.io.tempdir system property.
          • If the dependencies to the glibc (in Linux GLIBC2.3 or higher is required for now) and dylib (in Mac OS X) cause some problems, you can re-compile snappy-java's native library only for your own platform (with make clean-native native). No need to care about building native libraries for the other platforms if you never use them.

          The same thing between hadoop-snappy and snappy-java is:

          • Both approaches need to compile the native code (libhadoopsnappy.so or libsnappyjava.so) somewhere. My snappy-java simply provides pre-compiled libsnappyjava.so for various platforms.

          One of the design goals of snappy-java is to avoid troubles in linking against native libraries (e.g., libsnappy.so), such as crashes due to libstdc++ compatibility, missing libraries, etc. But as Alejandro suggested in my discussion group, using separate libsnappy.so and libsnapyjava.so is technically possible even in snappy-java:

          • First, tries to load pre-installed libsnappy.so and libsnappyjava.so (the version not containing libsnappy.so)
          • If not found, extract these libraries embedded in the JAR to somewhere.
          • Load both native libraries.

          I am not sure supporting such loading mechanism is a right way to go.

          Show
          Taro L. Saito added a comment - Let me clarify some differences between Issay's hadoop-snappy and my snappy-java: hadoop-snappy Uses libsnappy.so (available in recent Linux distributions) and libhadoopsnappy.so (JNI code compiled for the target platform) snappy-java Uses libsnappyjava.so (mixing up the original snappy and JNI code), or snappyjava.dll (for Windows), libsnappyjava.jnilib (for Mac OS X) It copies one of the native library to the directory specified in org.xerial.snappy.tempdir or java.io.tempdir system property. If the dependencies to the glibc (in Linux GLIBC2.3 or higher is required for now) and dylib (in Mac OS X) cause some problems, you can re-compile snappy-java's native library only for your own platform (with make clean-native native). No need to care about building native libraries for the other platforms if you never use them. The same thing between hadoop-snappy and snappy-java is: Both approaches need to compile the native code (libhadoopsnappy.so or libsnappyjava.so) somewhere. My snappy-java simply provides pre-compiled libsnappyjava.so for various platforms. One of the design goals of snappy-java is to avoid troubles in linking against native libraries (e.g., libsnappy.so), such as crashes due to libstdc++ compatibility, missing libraries, etc. But as Alejandro suggested in my discussion group, using separate libsnappy.so and libsnapyjava.so is technically possible even in snappy-java: First, tries to load pre-installed libsnappy.so and libsnappyjava.so (the version not containing libsnappy.so) If not found, extract these libraries embedded in the JAR to somewhere. Load both native libraries. I am not sure supporting such loading mechanism is a right way to go.
          Hide
          Alejandro Abdelnur added a comment -

          After mulling over this issue a bit more, reading a few times Todd's comment and asking around to folks that deal with nativelibs I'm having second thoughts about the committed patch based on snappy-java.

          The snappy-java approach is tempting because it 'just works' (without having to install snappy SO in your system). However, it has a serious drawback; the native code is not built in target OS, only on the same architecture. Because of this the build is not easy reproducible as there is not knowledge of the OS used to build it. In addition, this can lead to not avail dependencies in the running OS.

          The hadoop-snappy approach has the drawback that it requires an additional step (to install snappy SO in the platform), but as benefits it takes care of the drawbacks of the snappy-java approach; the native code is built in the target OS. Thus, resulting on easy reproducible builds. Furthermore the drawback is transient, until snappy is avail the different OSes by default or OS driven updates.

          A secondary issue is that snappy-java nativelib statically links snappy. As snappy SO makes it to standard Linux distributions, snappy-java will use a private copy of it instead using the one installed in the OS. On the other hand, hadoop-snappy SO dynamically links snappy SO, when snappy SO is available in the OS, it could be consumed directly from it. (this could be taken care by snappy-java if it changes to dynamically link snappy SO).

          Because of this I'd like to revert the snappy-java based patch and go for Issay's hadoop-snappy patch.

          Show
          Alejandro Abdelnur added a comment - After mulling over this issue a bit more, reading a few times Todd's comment and asking around to folks that deal with nativelibs I'm having second thoughts about the committed patch based on snappy-java. The snappy-java approach is tempting because it 'just works' (without having to install snappy SO in your system). However, it has a serious drawback; the native code is not built in target OS, only on the same architecture. Because of this the build is not easy reproducible as there is not knowledge of the OS used to build it. In addition, this can lead to not avail dependencies in the running OS. The hadoop-snappy approach has the drawback that it requires an additional step (to install snappy SO in the platform), but as benefits it takes care of the drawbacks of the snappy-java approach; the native code is built in the target OS. Thus, resulting on easy reproducible builds. Furthermore the drawback is transient, until snappy is avail the different OSes by default or OS driven updates. A secondary issue is that snappy-java nativelib statically links snappy. As snappy SO makes it to standard Linux distributions, snappy-java will use a private copy of it instead using the one installed in the OS. On the other hand, hadoop-snappy SO dynamically links snappy SO, when snappy SO is available in the OS, it could be consumed directly from it. (this could be taken care by snappy-java if it changes to dynamically link snappy SO). Because of this I'd like to revert the snappy-java based patch and go for Issay's hadoop-snappy patch.
          Hide
          T Jake Luciani added a comment -

          However, it has a serious drawback; the native code is not built in target OS, only on the same architecture. Because of this the build is not easy reproducible as there is not knowledge of the OS used to build it. In addition, this can lead to not avail dependencies in the running OS.

          Why not let snappy-java implement the (check for libsnappy.so on the system before using the bundled one) as Taro mentioned?

          Show
          T Jake Luciani added a comment - However, it has a serious drawback; the native code is not built in target OS, only on the same architecture. Because of this the build is not easy reproducible as there is not knowledge of the OS used to build it. In addition, this can lead to not avail dependencies in the running OS. Why not let snappy-java implement the (check for libsnappy.so on the system before using the bundled one) as Taro mentioned?
          Hide
          Alejandro Abdelnur added a comment -

          @Jake

          snappy-java does not load vanilla snappy SO, it loads snappy-java SO which is a combination of snappy SO + the JNI bindings for Java.

          Assuming that snappy-java splits the snappy & snappy-java native code in 2 SOs as I suggested before in the snappy-java alias (and Taro is not convinced of that approach, see the end of his comment) ... snappy SO would be loaded from the system but snappy-java SO would still be loaded from the JAR. This would mean that not avail dependencies could still happen for the snappy-java SO. And, if you want to have a snappy-java SO build for your OS, you'd have build it yourself but still consume the JAR that comes an external dependency.

          IMO this is a big NO NO. I rather have some extra setup work until snappy SO as commonly available with the OSes.

          Show
          Alejandro Abdelnur added a comment - @Jake snappy-java does not load vanilla snappy SO, it loads snappy-java SO which is a combination of snappy SO + the JNI bindings for Java. Assuming that snappy-java splits the snappy & snappy-java native code in 2 SOs as I suggested before in the snappy-java alias (and Taro is not convinced of that approach, see the end of his comment) ... snappy SO would be loaded from the system but snappy-java SO would still be loaded from the JAR. This would mean that not avail dependencies could still happen for the snappy-java SO. And, if you want to have a snappy-java SO build for your OS, you'd have build it yourself but still consume the JAR that comes an external dependency. IMO this is a big NO NO. I rather have some extra setup work until snappy SO as commonly available with the OSes.
          Hide
          Scott Carey added a comment -

          However, it has a serious drawback; the native code is not built in target OS, only on the same architecture. Because of this the build is not easy reproducible as there is not knowledge of the OS used to build it.

          Sure it is reproducible. snappy is used as an artifact, not built from source. The build is reproducible because it always uses the same artifact, and always produces the same output. Is it a requirement to recompile all Java jars to be reproducible?

          hadoop-snappy has another drawback/benefit pair:

          Users may have snappy-java in their paths for their own use (for example via Avro, Hive, Hbase, or user code).
          Drawback: the library can't be shared, bloating the # of classes and jars
          Benefit: the library won't have a version conflict
          Unknown(to me): does a snappy-java binding conflict with a hadoop custom one if both are loaded in the same JVM / Classloader?

          I think the check for a system available libsnappy.so prior to loading the one in the jar should go into the snappy-java project, then users can optionally compile one and make it available to Hadoop, or use the one in the jar, and Hadoop has to maintain less code and build infrastructure as a result.

          Show
          Scott Carey added a comment - However, it has a serious drawback; the native code is not built in target OS, only on the same architecture. Because of this the build is not easy reproducible as there is not knowledge of the OS used to build it. Sure it is reproducible. snappy is used as an artifact, not built from source. The build is reproducible because it always uses the same artifact, and always produces the same output. Is it a requirement to recompile all Java jars to be reproducible? hadoop-snappy has another drawback/benefit pair: Users may have snappy-java in their paths for their own use (for example via Avro, Hive, Hbase, or user code). Drawback: the library can't be shared, bloating the # of classes and jars Benefit: the library won't have a version conflict Unknown(to me): does a snappy-java binding conflict with a hadoop custom one if both are loaded in the same JVM / Classloader? I think the check for a system available libsnappy.so prior to loading the one in the jar should go into the snappy-java project, then users can optionally compile one and make it available to Hadoop, or use the one in the jar, and Hadoop has to maintain less code and build infrastructure as a result.
          Hide
          Alejandro Abdelnur added a comment -

          @Scott,

          snappy-java downloads snappy source TAR and builds it.

          Regarding to your 'unknown' last time I've check, snappy-java SO and snappy SO could not be loaded at the same time in a JVM, the JVM would core dump.

          Show
          Alejandro Abdelnur added a comment - @Scott, snappy-java downloads snappy source TAR and builds it. Regarding to your 'unknown' last time I've check, snappy-java SO and snappy SO could not be loaded at the same time in a JVM, the JVM would core dump.
          Hide
          Scott Carey added a comment -

          IMO this is a big NO NO. I rather have some extra setup work until snappy SO as commonly available with the OSes.

          This may happen with Linux and BSD within a couple years, but likely will never happen for anything else.

          snappy-java will be in user classpaths anyway. I doubt every other project will go with this method (and if they did, the work should be shared, not in Hadoop. If we feel strongly that snappy-java is doing it wrong, fork it into a new project and let everyone benefit!). Other projects without the manpower to maintain this will just use snappy-java as is. Avro has done this already and I'm sure others will too. Including a jar as a dependency and having it 'just work' for 99% of users is an easy win. Especially since Snappy is an optional compression codec for the vast majority of use cases.

          Show
          Scott Carey added a comment - IMO this is a big NO NO. I rather have some extra setup work until snappy SO as commonly available with the OSes. This may happen with Linux and BSD within a couple years, but likely will never happen for anything else. snappy-java will be in user classpaths anyway. I doubt every other project will go with this method (and if they did, the work should be shared, not in Hadoop. If we feel strongly that snappy-java is doing it wrong, fork it into a new project and let everyone benefit!). Other projects without the manpower to maintain this will just use snappy-java as is. Avro has done this already and I'm sure others will too. Including a jar as a dependency and having it 'just work' for 99% of users is an easy win. Especially since Snappy is an optional compression codec for the vast majority of use cases.
          Hide
          Scott Carey added a comment -

          Regarding to your 'unknown' last time I've check, snappy-java SO and snappy SO could not be loaded at the same time in a JVM, the JVM would core dump.

          -1 to coredumps. I already have snappy-java in my classpath in M/R jobs.

          snappy-java downloads snappy source TAR and builds it.

          Ah, you mean that snappy-java's build is not reproducible. Yes, I agree. I thought you meant that Hadoop was not reproducible if it used snappy-java.

          Show
          Scott Carey added a comment - Regarding to your 'unknown' last time I've check, snappy-java SO and snappy SO could not be loaded at the same time in a JVM, the JVM would core dump. -1 to coredumps. I already have snappy-java in my classpath in M/R jobs. snappy-java downloads snappy source TAR and builds it. Ah, you mean that snappy-java's build is not reproducible. Yes, I agree. I thought you meant that Hadoop was not reproducible if it used snappy-java.
          Hide
          Taro L. Saito added a comment -

          @Scott @Alejandro

          Current version of snappy-java-1.0.3-rc3 can load libsnappy.so and libsnappyjava.so at the same time. I fixed the build options to avoid collision of snappy API between two native libraries.

          Show
          Taro L. Saito added a comment - @Scott @Alejandro Current version of snappy-java-1.0.3-rc3 can load libsnappy.so and libsnappyjava.so at the same time. I fixed the build options to avoid collision of snappy API between two native libraries.
          Hide
          Taro L. Saito added a comment -

          I noticed that reusing libsnappy.so between multiple java libraries is not a good approach since JVM cannot load the same native libraries twice.

          If Hadoop loads libsnappy.so, the other third-party libraries cannot use libsnappy.so unless Snappy API used in the initial loader (i.e. Hadoop) is exposed to the users.

          Although I haven't fully confirmed that co-existence of libsnappy.so and libsnappyjava.so does not cause any serious problems, if this approach works well, Hadoop should integrate Issay's hadoop-snappy, which uses pre-installed libsnappy.so. The other Hadoop users can use snappy-java by simply putting the jar into their class path.

          Show
          Taro L. Saito added a comment - I noticed that reusing libsnappy.so between multiple java libraries is not a good approach since JVM cannot load the same native libraries twice. If Hadoop loads libsnappy.so, the other third-party libraries cannot use libsnappy.so unless Snappy API used in the initial loader (i.e. Hadoop) is exposed to the users. Although I haven't fully confirmed that co-existence of libsnappy.so and libsnappyjava.so does not cause any serious problems, if this approach works well, Hadoop should integrate Issay's hadoop-snappy, which uses pre-installed libsnappy.so. The other Hadoop users can use snappy-java by simply putting the jar into their class path.
          Hide
          Allen Wittenauer added a comment -

          How does this work with 32-bit vs. 64-bit? You push different jars?

          Show
          Allen Wittenauer added a comment - How does this work with 32-bit vs. 64-bit? You push different jars?
          Hide
          Taro L. Saito added a comment -

          @Allen

          snappy-java checks os.arch system property, which shows different values for 32 and 64 bit CPUs. In a single snappy-java's jar file, native libraries for both 32 and 64 bits CPUs are bundled, and one of them is extracted according to the machine environment.

          Show
          Taro L. Saito added a comment - @Allen snappy-java checks os.arch system property, which shows different values for 32 and 64 bit CPUs. In a single snappy-java's jar file, native libraries for both 32 and 64 bits CPUs are bundled, and one of them is extracted according to the machine environment.
          Hide
          Scott Carey added a comment -

          Would it be reasonable to split snappy-java up into multiple maven modules, so that users can choose which parts they want? Perhaps they want the Java portions, but none of the JNI, or want the JNI wrapper but not the implementation, etc. A snappy-java could still be built with all artifacts bundled, but a la carte could also be chosen without a custom build of snappy-java.

          I think it is best for users if as many use cases as possible were handled in one central library that could be shared by multiple projects. Any work done for Hadoop is likely to be repeated elsewhere.

          Show
          Scott Carey added a comment - Would it be reasonable to split snappy-java up into multiple maven modules, so that users can choose which parts they want? Perhaps they want the Java portions, but none of the JNI, or want the JNI wrapper but not the implementation, etc. A snappy-java could still be built with all artifacts bundled, but a la carte could also be chosen without a custom build of snappy-java. I think it is best for users if as many use cases as possible were handled in one central library that could be shared by multiple projects. Any work done for Hadoop is likely to be repeated elsewhere.
          Hide
          Allen Wittenauer added a comment -

          How safe is os.arch on non-Sun JVMs? How safe are the .so's on non-Sun JVMs?

          Show
          Allen Wittenauer added a comment - How safe is os.arch on non-Sun JVMs? How safe are the .so's on non-Sun JVMs?
          Hide
          Taro L. Saito added a comment -

          @Scott

          JNI interface is tightly coupled with the Java code, so splitting Java/JNI/native libraries would not work as you expected. A reasonable option I can think of is to provide snappy-java without native libraries. In this case, users need to compile/install snappy-java's native library (make install to copy libsnappy.so to /usr/local/lib) and a jar file without native libraries. This style fits to users who need to compile everything from scratch. Snappy-java tries to load native libraries in the path specified in java.library.path if no bundled native library is found.

          @Allen
          I am not familiar with non-Sun JVMs, but loading .so files depends only on locally installed library (GLIBC 2.3 or higher). No other dependencies exists in pre-compiled native libraries (for Linix) in snappy-java, because I carefully chosen the compilation options to avoid extra dependencies.

          Show
          Taro L. Saito added a comment - @Scott JNI interface is tightly coupled with the Java code, so splitting Java/JNI/native libraries would not work as you expected. A reasonable option I can think of is to provide snappy-java without native libraries. In this case, users need to compile/install snappy-java's native library (make install to copy libsnappy.so to /usr/local/lib) and a jar file without native libraries. This style fits to users who need to compile everything from scratch. Snappy-java tries to load native libraries in the path specified in java.library.path if no bundled native library is found. @Allen I am not familiar with non-Sun JVMs, but loading .so files depends only on locally installed library (GLIBC 2.3 or higher). No other dependencies exists in pre-compiled native libraries (for Linix) in snappy-java, because I carefully chosen the compilation options to avoid extra dependencies.
          Hide
          Taro L. Saito added a comment -

          Fix:
          make install to copy libsnappy.so libsnappyjava.so to /usr/local/lib

          Show
          Taro L. Saito added a comment - Fix: make install to copy libsnappy.so libsnappyjava.so to /usr/local/lib
          Hide
          Todd Lipcon added a comment -

          Regarding the conflicts, it seems this is a visibility bug. In the case that the snappy code gets compiled into the same .so file as the JNI bindings, it should be set up with -fvisibility=hidden, and only the JNI implementations of native functions marked as exported symbols. Anything else is pretty unclean.

          Show
          Todd Lipcon added a comment - Regarding the conflicts, it seems this is a visibility bug. In the case that the snappy code gets compiled into the same .so file as the JNI bindings, it should be set up with -fvisibility=hidden, and only the JNI implementations of native functions marked as exported symbols. Anything else is pretty unclean.
          Hide
          Taro L. Saito added a comment -

          @Todd

          As you mentioned, snappy-java's native libraries are now built with -fvisibility=hidden, and I already tried to load libsnappy.so, then used API in snappy-java. No JVM crash is observed.

          Show
          Taro L. Saito added a comment - @Todd As you mentioned, snappy-java's native libraries are now built with -fvisibility=hidden, and I already tried to load libsnappy.so, then used API in snappy-java. No JVM crash is observed.
          Hide
          issei yoshida added a comment -

          Taro,

          > I noticed that reusing libsnappy.so between multiple java libraries is not a good approach since JVM cannot load the same native libraries twice.

          The patch I uploaded before loads libsnappy.so via dlopen in the native code,
          so it probably isn't a problem.

          Show
          issei yoshida added a comment - Taro, > I noticed that reusing libsnappy.so between multiple java libraries is not a good approach since JVM cannot load the same native libraries twice. The patch I uploaded before loads libsnappy.so via dlopen in the native code, so it probably isn't a problem.
          Hide
          Alejandro Abdelnur added a comment -

          @Taro, regarding 'loading native libs twice', if using the same classloader, loading a nativelib a second time is a NOP, the JVM complains only if the classloader is different from the first load.

          Show
          Alejandro Abdelnur added a comment - @Taro, regarding 'loading native libs twice', if using the same classloader, loading a nativelib a second time is a NOP, the JVM complains only if the classloader is different from the first load.
          Hide
          Allen Wittenauer added a comment -

          I'd like to see:

          a) OS X (32-bit and 64-bit) support
          b) This tested on something other than a Sun JVM, preferably IBM's since a few folks are using that.

          Also, how hard is it to add a non-bundled so?

          Show
          Allen Wittenauer added a comment - I'd like to see: a) OS X (32-bit and 64-bit) support b) This tested on something other than a Sun JVM, preferably IBM's since a few folks are using that. Also, how hard is it to add a non-bundled so?
          Hide
          Taro L. Saito added a comment -

          @Issei @Alejandro
          Great. That means as long as using the same classloader (as Hadoop seems to do so), reusing libsnappy.so between hadoop-snappy and snappy-java is no problem. Now, it looks like whether to use libsnappy.so or not is a problem of snappy-java, and I prefer to use libsnappyjava.so (statically linked snappy + JNI code with -fvisibility=hiden option), which can avoid potential API conflict and missing library problems (for some OSes).

          In my experience of developing sqlite-jdbc (http://sqlite-jdbc.googlecode.com/), which uses the same technique to extract .so file at runtime, many people seems to be satisfied with this approach. The problem that can be solved by the runtime library extraction is failures due to misconfiguration (e.g., LD_LIBRARY_PATH, etc.) and library build process (gcc, linker options, etc.) for each OS. For example, I frequently use Windows to develop the code, but run the production code under Linux; no need to switch the library files really helps me a lot. But, looking at HADOOP-7405, current Hadoop's native libraries are not so portable across various OSes. In such a state, motivation for using portable library something like snappy-java might be low.

          I don't care which one is used in Hadoop, but the discussion in this thread has been useful for me to improve snappy-java. Thanks!

          @Allen
          a) OS X (32-bit/64-bit) are already supported.
          b) I need to know os.name and os.arch name system properties that IBM JVM provides.

          Building and embedding non-bundled so file into snappy-java is simple; just do "make".
          As a matter of fact, I do it for 6 types of OS and CPU combinations. And also, by using VMWare FUSION in Mac, all of native libraries currently supported can be compiled in a single machine. Some 64-bit OS can be used to build 32-bit native libraries (e.g., Windows, Linux, etc.)

          Show
          Taro L. Saito added a comment - @Issei @Alejandro Great. That means as long as using the same classloader (as Hadoop seems to do so), reusing libsnappy.so between hadoop-snappy and snappy-java is no problem. Now, it looks like whether to use libsnappy.so or not is a problem of snappy-java, and I prefer to use libsnappyjava.so (statically linked snappy + JNI code with -fvisibility=hiden option), which can avoid potential API conflict and missing library problems (for some OSes). In my experience of developing sqlite-jdbc ( http://sqlite-jdbc.googlecode.com/ ), which uses the same technique to extract .so file at runtime, many people seems to be satisfied with this approach. The problem that can be solved by the runtime library extraction is failures due to misconfiguration (e.g., LD_LIBRARY_PATH, etc.) and library build process (gcc, linker options, etc.) for each OS. For example, I frequently use Windows to develop the code, but run the production code under Linux; no need to switch the library files really helps me a lot. But, looking at HADOOP-7405 , current Hadoop's native libraries are not so portable across various OSes. In such a state, motivation for using portable library something like snappy-java might be low. I don't care which one is used in Hadoop, but the discussion in this thread has been useful for me to improve snappy-java. Thanks! @Allen a) OS X (32-bit/64-bit) are already supported. b) I need to know os.name and os.arch name system properties that IBM JVM provides. Building and embedding non-bundled so file into snappy-java is simple; just do "make". As a matter of fact, I do it for 6 types of OS and CPU combinations. And also, by using VMWare FUSION in Mac, all of native libraries currently supported can be compiled in a single machine. Some 64-bit OS can be used to build 32-bit native libraries (e.g., Windows, Linux, etc.)
          Hide
          Alejandro Abdelnur added a comment -

          Attached is a patch based on Issay's patch.

          • It reverts the current committed HADOOP-7206 & HADDOP-7407
          • It uses Snappy native directly
          • It adds the JNI bindings to Hadoop native
          • Via configure.ac, if snappy is not available it ignores Snappy JNI bindings
          • Snappy lib is looked (by default) at /usr/local
          • Once Snappy is avail by default in different OSes, the default lookup can be changed to /usr/ for automatic detection
          • Location of Snappy lib can be altered with -Dsnappy.prefix= Ant option
          • SnappyCodec is defined in core-default.xml
          • If the Snappy JNI bindings and/or Snappy are not present, SnappyCodec warns and continues

          IMO this addresses the concerns previously discussed in the JIRA. And it will not break the build in Apache Jenkins machines if Snappy is not install.

          Show
          Alejandro Abdelnur added a comment - Attached is a patch based on Issay's patch. It reverts the current committed HADOOP-7206 & HADDOP-7407 It uses Snappy native directly It adds the JNI bindings to Hadoop native Via configure.ac , if snappy is not available it ignores Snappy JNI bindings Snappy lib is looked (by default) at /usr/local Once Snappy is avail by default in different OSes, the default lookup can be changed to /usr/ for automatic detection Location of Snappy lib can be altered with -Dsnappy.prefix= Ant option SnappyCodec is defined in core-default.xml If the Snappy JNI bindings and/or Snappy are not present, SnappyCodec warns and continues IMO this addresses the concerns previously discussed in the JIRA. And it will not break the build in Apache Jenkins machines if Snappy is not install.
          Hide
          Tom White added a comment -

          TestCodec passes for me both when snappy is not installed and when snappy is installed and native compilation is enabled. +1

          Show
          Tom White added a comment - TestCodec passes for me both when snappy is not installed and when snappy is installed and native compilation is enabled. +1
          Hide
          Alejandro Abdelnur added a comment -

          The updated patch allows the Ant build to include the snappy SO next to the hadoop SO.

          By default this option is disabled, to enabled invoke Ant with the '-Dbundle.snappy=true' option.

          Show
          Alejandro Abdelnur added a comment - The updated patch allows the Ant build to include the snappy SO next to the hadoop SO. By default this option is disabled, to enabled invoke Ant with the '-Dbundle.snappy=true' option.
          Hide
          Eli Collins added a comment -

          Hey Alejandro - mind if I revert the previous two patches and then we can apply the new one? Would be nice to have this go in a single commit.

          Show
          Eli Collins added a comment - Hey Alejandro - mind if I revert the previous two patches and then we can apply the new one? Would be nice to have this go in a single commit.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12483739/HADOOP-7206revertplusnew-b.patch
          against trunk revision 1139123.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings).

          +1 core tests. The patch passed core unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/669//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/669//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/669//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/669//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12483739/HADOOP-7206revertplusnew-b.patch against trunk revision 1139123. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings). +1 core tests. The patch passed core unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/669//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/669//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/669//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/669//console This message is automatically generated.
          Hide
          Alejandro Abdelnur added a comment -

          Eli, I'm good. I'll prepare that patch and I'll add the missing AL

          Show
          Alejandro Abdelnur added a comment - Eli, I'm good. I'll prepare that patch and I'll add the missing AL
          Hide
          Eli Collins added a comment -

          Great. I've reverted 7206 and 7407, you can attach a new patch now.

          Show
          Eli Collins added a comment - Great. I've reverted 7206 and 7407, you can attach a new patch now.
          Hide
          Alejandro Abdelnur added a comment -

          It works on top of the reverted trunk.

          Added missing AL

          Replaced System.err.println() with log messages

          Show
          Alejandro Abdelnur added a comment - It works on top of the reverted trunk. Added missing AL Replaced System.err.println() with log messages
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Alejandro, could you add javadoc for the new public method/fields and @Override for the overriding methods?

          Show
          Tsz Wo Nicholas Sze added a comment - Hi Alejandro, could you add javadoc for the new public method/fields and @Override for the overriding methods?
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12483753/HADOOP-7206new.patch
          against trunk revision 1139387.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/670//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/670//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/670//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12483753/HADOOP-7206new.patch against trunk revision 1139387. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/670//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/670//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/670//console This message is automatically generated.
          Hide
          Eli Collins added a comment -

          Looks great! Thanks for coming up with an implementation that incorporates all the discussion.

          Testing?

          Modulo Nicholas' good feedback, minor review feedback follows

          • Why is property environment="env" needed? Can we remove that same setting from the findbugs target now that it's global?
          • Indicate is -> Indicate if
          • java.library.path should use $ {snappy.lib}

            vs $

            {snappy.prefix}

            /lib

          • #if shoudldn't be indented in org_apache_hadoop_io_compress_snappy.h, and Snappy*.c, ie everything can be shifted left one stop.
          • 1st echo in packageNativeHadoop.sh is missindented
          • "io.compression.codec.snappy.buffersize" should be defined in CommonConfigurationKeys
          • In LoadSnappy I don't think we need to log if hadoopNativeAvailable because that will is indicated by the output of NativeCodeLoader already
          Show
          Eli Collins added a comment - Looks great! Thanks for coming up with an implementation that incorporates all the discussion. Testing? Modulo Nicholas' good feedback, minor review feedback follows Why is property environment="env" needed? Can we remove that same setting from the findbugs target now that it's global? Indicate is -> Indicate if java.library.path should use $ {snappy.lib} vs $ {snappy.prefix} /lib #if shoudldn't be indented in org_apache_hadoop_io_compress_snappy.h, and Snappy*.c, ie everything can be shifted left one stop. 1st echo in packageNativeHadoop.sh is missindented "io.compression.codec.snappy.buffersize" should be defined in CommonConfigurationKeys In LoadSnappy I don't think we need to log if hadoopNativeAvailable because that will is indicated by the output of NativeCodeLoader already
          Hide
          Alejandro Abdelnur added a comment -

          Incorporating Nicholas's & Eli's comments.

          Thxs.

          Show
          Alejandro Abdelnur added a comment - Incorporating Nicholas's & Eli's comments. Thxs.
          Hide
          Alejandro Abdelnur added a comment -

          Regarding testing, i've tested it in Ubuntu, with and without snappy SO present.

          The <property environment="env"/> makes the ENV variables available as properties with prefix "env.".

          Show
          Alejandro Abdelnur added a comment - Regarding testing, i've tested it in Ubuntu, with and without snappy SO present. The <property environment="env"/> makes the ENV variables available as properties with prefix "env.".
          Hide
          Eli Collins added a comment -

          +1 I'll wait for Hudson to finish before committing.

          Show
          Eli Collins added a comment - +1 I'll wait for Hudson to finish before committing.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12483793/HADOOP-7206new-b.patch
          against trunk revision 1139387.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/671//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/671//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/671//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12483793/HADOOP-7206new-b.patch against trunk revision 1139387. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/671//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/671//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/671//console This message is automatically generated.
          Hide
          Alejandro Abdelnur added a comment -

          var mistake in build script

          Show
          Alejandro Abdelnur added a comment - var mistake in build script
          Hide
          Eli Collins added a comment -

          Ah, thought those snappy.* properties were globally scoped, +1 to fix in the latest patch.

          Show
          Eli Collins added a comment - Ah, thought those snappy.* properties were globally scoped, +1 to fix in the latest patch.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12483796/HADOOP-7206new-c.patch
          against trunk revision 1139387.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/672//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/672//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/672//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12483796/HADOOP-7206new-c.patch against trunk revision 1139387. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/672//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/672//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/672//console This message is automatically generated.
          Hide
          Eli Collins added a comment -

          I verified that with snappy-1.0.3 installed on my host a tarball built with ant -Dforrest.home=$FORREST_HOME -Dbundle.snappy=true compile-native tar correctly bundles the snappy lib. Also tested that TestCodec run from this tarball runs w/ native snappy enabled when it (and libhadoop) are available on the host.

          Show
          Eli Collins added a comment - I verified that with snappy-1.0.3 installed on my host a tarball built with ant -Dforrest.home=$FORREST_HOME -Dbundle.snappy=true compile-native tar correctly bundles the snappy lib. Also tested that TestCodec run from this tarball runs w/ native snappy enabled when it (and libhadoop) are available on the host.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          @Alejandro, thanks for adding the javadoc.

          Show
          Tsz Wo Nicholas Sze added a comment - @Alejandro, thanks for adding the javadoc.
          Hide
          Eli Collins added a comment -

          I've committed this. Thank you Issei and Alejandro!

          Show
          Eli Collins added a comment - I've committed this. Thank you Issei and Alejandro!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-maven #43 (See https://builds.apache.org/job/Hadoop-Common-trunk-maven/43/)

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-maven #43 (See https://builds.apache.org/job/Hadoop-Common-trunk-maven/43/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #732 (See https://builds.apache.org/job/Hadoop-Common-trunk/732/)

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #732 (See https://builds.apache.org/job/Hadoop-Common-trunk/732/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #668 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/668/)

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #668 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/668/ )
          Hide
          Vinod Kumar Vavilapalli added a comment -

          I back-ported the trunk-patch to branch-1 and here's the attachment. Tested it on a cluster.

          Show
          Vinod Kumar Vavilapalli added a comment - I back-ported the trunk-patch to branch-1 and here's the attachment. Tested it on a cluster.
          Hide
          Owen O'Malley added a comment -

          Vinod's backport looks good to me. Any other comments?

          Show
          Owen O'Malley added a comment - Vinod's backport looks good to me. Any other comments?
          Hide
          Giridharan Kesavan added a comment -

          snappy.lib variable is hardcoded to /usr/lib which may not be true in the case of 64 bit lib where snappy is in /usr/lib64.

          Show
          Giridharan Kesavan added a comment - snappy.lib variable is hardcoded to /usr/lib which may not be true in the case of 64 bit lib where snappy is in /usr/lib64.
          Hide
          Matt Foley added a comment -

          When this is ready, please commit to branch-1 and branch-1.0. Thanks!

          Show
          Matt Foley added a comment - When this is ready, please commit to branch-1 and branch-1.0. Thanks!
          Hide
          Vinod Kumar Vavilapalli added a comment -

          snappy.lib variable is hardcoded to /usr/lib which may not be true in the case of 64 bit lib where snappy is in /usr/lib64.

          Thanks for pointing this out, Giri. I just removed using explicit ends for compiling snappy. I guess we need this for trunk too, but we can do it later.

          Attaching patch, that should fix this.

          Show
          Vinod Kumar Vavilapalli added a comment - snappy.lib variable is hardcoded to /usr/lib which may not be true in the case of 64 bit lib where snappy is in /usr/lib64. Thanks for pointing this out, Giri. I just removed using explicit ends for compiling snappy. I guess we need this for trunk too, but we can do it later. Attaching patch, that should fix this.
          Hide
          Giridharan Kesavan added a comment -

          Thanks for the quick fix vinod, Im able to build hadoop with snappy for both 32 & 64 bit platforms with patch HADOOP-7206-20120302.txt

          Show
          Giridharan Kesavan added a comment - Thanks for the quick fix vinod, Im able to build hadoop with snappy for both 32 & 64 bit platforms with patch HADOOP-7206 -20120302.txt
          Hide
          Giridharan Kesavan added a comment -

          here is my +1

          Show
          Giridharan Kesavan added a comment - here is my +1
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Thank for the go signal, Matt. I just committed the back-ported patch to branch-1 and branch-1.0.

          Also, we need to set the fix version to be 1.0.2, but I cannot find it in the version list. I set it to 1.0.1 for now, can you please fix it? Thanks!

          Show
          Vinod Kumar Vavilapalli added a comment - Thank for the go signal, Matt. I just committed the back-ported patch to branch-1 and branch-1.0. Also, we need to set the fix version to be 1.0.2, but I cannot find it in the version list. I set it to 1.0.1 for now, can you please fix it? Thanks!

            People

            • Assignee:
              Alejandro Abdelnur
              Reporter:
              Eli Collins
            • Votes:
              13 Vote for this issue
              Watchers:
              54 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development