Pig
  1. Pig
  2. PIG-2927

SHIP and use JRuby gems in JRuby UDFs

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.11
    • Fix Version/s: 0.14.0
    • Component/s: parser
    • Labels:
      None
    • Environment:

      JRuby UDFs

      Description

      It would be great to use JRuby gems in JRuby UDFs without installing them on all machines on the cluster. Some way to SHIP them automatically with the job would be great.

      1. PIG-2927-0.patch
        22 kB
        Jonathan Coveney
      2. PIG-2927-1.patch
        20 kB
        Jonathan Coveney
      3. PIG-2927-2.patch
        21 kB
        Jonathan Coveney
      4. PIG-2927-3.patch
        23 kB
        Jonathan Coveney
      5. PIG-2927-4.patch
        48 kB
        Jonathan Coveney

        Issue Links

          Activity

          Russell Jurney created issue -
          Hide
          Dmitriy V. Ryaboy added a comment -

          Russel, "critical" is for critical bugs, not for desired new features. Example of something critical: Pig produces wrong data, Pig doesn't compile, Pig exhibits 10x performance regression.

          For further elucidation, I refer you to Wikipedia: http://en.wikipedia.org/wiki/The_Boy_Who_Cried_Wolf

          Show
          Dmitriy V. Ryaboy added a comment - Russel, "critical" is for critical bugs, not for desired new features. Example of something critical: Pig produces wrong data, Pig doesn't compile, Pig exhibits 10x performance regression. For further elucidation, I refer you to Wikipedia: http://en.wikipedia.org/wiki/The_Boy_Who_Cried_Wolf
          Dmitriy V. Ryaboy made changes -
          Field Original Value New Value
          Priority Critical [ 2 ] Minor [ 4 ]
          Assignee Jonathan Coveney [ jcoveney ]
          Hide
          Russell Jurney added a comment -

          Anything other than "Major", aka "never implement me."

          Show
          Russell Jurney added a comment - Anything other than "Major", aka "never implement me."
          Hide
          Russell Jurney added a comment -

          Moving from email.

          http://stackoverflow.com/a/10083594/13969 shows you the installed path of a given gem.

          jruby -S gem install mail
          jirb
          require 'rubygems'
          spec = Gem::Specification.find_by_name("mail")
          spec.gem_dir

          But really... we can just use Bundler right? Its perfect - it expects a user to create their own gem environment. http://gembundler.com/ Then there is a list of gems in a jruby file, and we can go through that list and get the installed location in our bundler environment (which is likely to be in a subdirectory).

          Show
          Russell Jurney added a comment - Moving from email. http://stackoverflow.com/a/10083594/13969 shows you the installed path of a given gem. jruby -S gem install mail jirb require 'rubygems' spec = Gem::Specification.find_by_name("mail") spec.gem_dir But really... we can just use Bundler right? Its perfect - it expects a user to create their own gem environment.  http://gembundler.com/ Then there is a list of gems in a jruby file, and we can go through that list and get the installed location in our bundler environment (which is likely to be in a subdirectory).
          Russell Jurney made changes -
          Assignee Russell Jurney [ russell.jurney ]
          Affects Version/s 0.11 [ 12318878 ]
          Environment JRuby UDFs
          Component/s parser [ 12315409 ]
          Hide
          Russell Jurney added a comment -

          I'm hearing that I can REGISTER these, as Jython does. I look into that.

          Show
          Russell Jurney added a comment - I'm hearing that I can REGISTER these, as Jython does. I look into that.
          Jonathan Coveney made changes -
          Assignee Russell Jurney [ russell.jurney ] Jonathan Coveney [ jcoveney ]
          Hide
          Jonathan Coveney added a comment -

          I just attached a patch that adds gem support. Any script registered as so:

          register script.rb using jruby as myfuncs;

          will be instantiated locally, and the set of dependencies across scripts will be shipped.

          Note:

          • JRUBY_HOME must be set on the client side. This is so we know where to find the gems!
          • you need to make sure to have a "require 'pigudf'" since it is instantiated on the client side (I could probably fix this, but I think forcing this is desirable for testing of scripts etc)

          Russell,

          Any chance you can try this out and see if it works?

          I'll add tests sometime...

          Show
          Jonathan Coveney added a comment - I just attached a patch that adds gem support. Any script registered as so: register script.rb using jruby as myfuncs; will be instantiated locally, and the set of dependencies across scripts will be shipped. Note: JRUBY_HOME must be set on the client side. This is so we know where to find the gems! you need to make sure to have a "require 'pigudf'" since it is instantiated on the client side (I could probably fix this, but I think forcing this is desirable for testing of scripts etc) Russell, Any chance you can try this out and see if it works? I'll add tests sometime...
          Jonathan Coveney made changes -
          Attachment PIG-2927-0.patch [ 12547473 ]
          Hide
          Jonathan Coveney added a comment -

          Oh, and of course, you need to have the gem (and dependencies) installed locally to wherever JRUBY_HOME is set.

          Show
          Jonathan Coveney added a comment - Oh, and of course, you need to have the gem (and dependencies) installed locally to wherever JRUBY_HOME is set.
          Hide
          Russell Jurney added a comment -

          Thanks, will test this by next week.

          Show
          Russell Jurney added a comment - Thanks, will test this by next week.
          Hide
          Russell Jurney added a comment -

          Testing now...

          Show
          Russell Jurney added a comment - Testing now...
          Hide
          Russell Jurney added a comment -

          Patch does not apply to trunk, please advise:

          Russells-MacBook-Pro:pig-trunk rjurney$ patch -p0 <PIG-2927-0.patch
          patching file build.xml
          patching file ivy/libraries.properties
          patching file src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
          Reversed (or previously applied) patch detected! Assume -R? [n] y
          Hunk #5 FAILED at 107.
          Hunk #6 FAILED at 659.
          2 out of 6 hunks FAILED – saving rejects to file src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java.rej
          patching file src/org/apache/pig/impl/util/JarManager.java
          patching file src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java
          patching file src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java
          patching file src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java
          patching file src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java

          Show
          Russell Jurney added a comment - Patch does not apply to trunk, please advise: Russells-MacBook-Pro:pig-trunk rjurney$ patch -p0 < PIG-2927 -0.patch patching file build.xml patching file ivy/libraries.properties patching file src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java Reversed (or previously applied) patch detected! Assume -R? [n] y Hunk #5 FAILED at 107. Hunk #6 FAILED at 659. 2 out of 6 hunks FAILED – saving rejects to file src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java.rej patching file src/org/apache/pig/impl/util/JarManager.java patching file src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java patching file src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java patching file src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java patching file src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java
          Jonathan Coveney made changes -
          Attachment PIG-2927-1.patch [ 12549227 ]
          Hide
          Jonathan Coveney added a comment -

          Russell, I pulled trunk and made a new patch. Should apply fine now.

          Show
          Jonathan Coveney added a comment - Russell, I pulled trunk and made a new patch. Should apply fine now.
          Hide
          Russell Jurney added a comment -

          Thanks, patch applied. Testing now.

          Show
          Russell Jurney added a comment - Thanks, patch applied. Testing now.
          Hide
          Russell Jurney added a comment -

          No build

          [echo] *** Building Main Sources ***
          [echo] *** To compile with all warnings enabled, supply -Dall.warnings=1 on command line ***
          [echo] *** If all.warnings property is supplied, compile-sources-all-warnings target will be executed ***
          [echo] *** Else, compile-sources (which only warns about deprecations) target will be executed ***

          compile-sources:
          [javac] /Users/rjurney/Software/pig-trunk/build.xml:503: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
          [javac] Compiling 14 source files to /Users/rjurney/Software/pig-trunk/build/classes
          [javac] /Users/rjurney/Software/pig-trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java:51: package org.apache.pig.tar does not exist
          [javac] import org.apache.pig.tar.TarUtils;
          [javac] ^
          [javac] /Users/rjurney/Software/pig-trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java:170: cannot find symbol
          [javac] symbol : variable TarUtils
          [javac] location: class org.apache.pig.scripting.jruby.JrubyScriptEngine
          [javac] TarUtils.tarFile(GEM_DIR_BASE_NAME, f.getParentFile(), new File(f, "lib"), os);
          [javac] ^
          [javac] Note: /Users/rjurney/Software/pig-trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java uses or overrides a deprecated API.
          [javac] Note: Recompile with -Xlint:deprecation for details.
          [javac] 2 errors

          BUILD FAILED
          /Users/rjurney/Software/pig-trunk/build.xml:451: The following error occurred while executing this line:
          /Users/rjurney/Software/pig-trunk/build.xml:503: Compile failed; see the compiler error output for details.

          Total time: 1 minute 0 seconds

          Show
          Russell Jurney added a comment - No build [echo] *** Building Main Sources *** [echo] *** To compile with all warnings enabled, supply -Dall.warnings=1 on command line *** [echo] *** If all.warnings property is supplied, compile-sources-all-warnings target will be executed *** [echo] *** Else, compile-sources (which only warns about deprecations) target will be executed *** compile-sources: [javac] /Users/rjurney/Software/pig-trunk/build.xml:503: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds [javac] Compiling 14 source files to /Users/rjurney/Software/pig-trunk/build/classes [javac] /Users/rjurney/Software/pig-trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java:51: package org.apache.pig.tar does not exist [javac] import org.apache.pig.tar.TarUtils; [javac] ^ [javac] /Users/rjurney/Software/pig-trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java:170: cannot find symbol [javac] symbol : variable TarUtils [javac] location: class org.apache.pig.scripting.jruby.JrubyScriptEngine [javac] TarUtils.tarFile(GEM_DIR_BASE_NAME, f.getParentFile(), new File(f, "lib"), os); [javac] ^ [javac] Note: /Users/rjurney/Software/pig-trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] 2 errors BUILD FAILED /Users/rjurney/Software/pig-trunk/build.xml:451: The following error occurred while executing this line: /Users/rjurney/Software/pig-trunk/build.xml:503: Compile failed; see the compiler error output for details. Total time: 1 minute 0 seconds
          Hide
          Jonathan Coveney added a comment -

          Russell, thanks for your help on this I think it was picking up the dep from my environment. I think it should pull it from maven now.

          Show
          Jonathan Coveney added a comment - Russell, thanks for your help on this I think it was picking up the dep from my environment. I think it should pull it from maven now.
          Jonathan Coveney made changes -
          Attachment PIG-2927-2.patch [ 12549233 ]
          Hide
          Russell Jurney added a comment -

          Maven is on drugs. I try with stuff I had, no work. I pull from trunk, apply again, no work. Why can't it see the tar package?

          compile-sources:
          [javac] /Users/rjurney/Software/pig-trunk/build.xml:504: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
          [javac] Compiling 16 source files to /Users/rjurney/Software/pig-trunk/build/classes
          [javac] /Users/rjurney/Software/pig-trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java:51: package org.apache.pig.tar does not exist
          [javac] import org.apache.pig.tar.TarUtils;
          [javac] ^
          [javac] /Users/rjurney/Software/pig-trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java:170: cannot find symbol
          [javac] symbol : variable TarUtils
          [javac] location: class org.apache.pig.scripting.jruby.JrubyScriptEngine
          [javac] TarUtils.tarFile(GEM_DIR_BASE_NAME, f.getParentFile(), new File(f, "lib"), os);
          [javac] ^
          [javac] Note: /Users/rjurney/Software/pig-trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java uses or overrides a deprecated API.
          [javac] Note: Recompile with -Xlint:deprecation for details.
          [javac] 2 errors

          BUILD FAILED
          /Users/rjurney/Software/pig-trunk/build.xml:451: The following error occurred while executing this line:
          /Users/rjurney/Software/pig-trunk/build.xml:504: Compile failed; see the compiler error output for details.

          Total time: 14 seconds

          Show
          Russell Jurney added a comment - Maven is on drugs. I try with stuff I had, no work. I pull from trunk, apply again, no work. Why can't it see the tar package? compile-sources: [javac] /Users/rjurney/Software/pig-trunk/build.xml:504: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds [javac] Compiling 16 source files to /Users/rjurney/Software/pig-trunk/build/classes [javac] /Users/rjurney/Software/pig-trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java:51: package org.apache.pig.tar does not exist [javac] import org.apache.pig.tar.TarUtils; [javac] ^ [javac] /Users/rjurney/Software/pig-trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java:170: cannot find symbol [javac] symbol : variable TarUtils [javac] location: class org.apache.pig.scripting.jruby.JrubyScriptEngine [javac] TarUtils.tarFile(GEM_DIR_BASE_NAME, f.getParentFile(), new File(f, "lib"), os); [javac] ^ [javac] Note: /Users/rjurney/Software/pig-trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] 2 errors BUILD FAILED /Users/rjurney/Software/pig-trunk/build.xml:451: The following error occurred while executing this line: /Users/rjurney/Software/pig-trunk/build.xml:504: Compile failed; see the compiler error output for details. Total time: 14 seconds
          Hide
          Jonathan Coveney added a comment -

          Wasn't maven. Was me overlooking the file in the git branch. Woops. Thanks for your patience, Russell.

          Show
          Jonathan Coveney added a comment - Wasn't maven. Was me overlooking the file in the git branch. Woops. Thanks for your patience, Russell.
          Jonathan Coveney made changes -
          Attachment PIG-2927-3.patch [ 12549243 ]
          Hide
          Russell Jurney added a comment -

          It builds! Ok now I test.

          Show
          Russell Jurney added a comment - It builds! Ok now I test.
          Russell Jurney made changes -
          Fix Version/s 0.11 [ 12318878 ]
          Hide
          Russell Jurney added a comment -

          Initial test works:

          register 'test.rb' using jruby as myfuncs;

          A = load 'test/data/pigunit/top_queries_input_data.txt';
          B = foreach A generate myfuncs.concat($0, $1);

          require 'rubygems'
          require 'pigudf'
          require 'mail'
          class Myudfs < PigUdf
          def concat *input
          input.inject(:+)
          end
          end

          Show
          Russell Jurney added a comment - Initial test works: — register 'test.rb' using jruby as myfuncs; A = load 'test/data/pigunit/top_queries_input_data.txt'; B = foreach A generate myfuncs.concat($0, $1); — require 'rubygems' require 'pigudf' require 'mail' class Myudfs < PigUdf def concat *input input.inject(:+) end end
          Hide
          Russell Jurney added a comment -

          Jon: Should I write e2e tests or unit tests? Not sure how to test this, can you advise? Don't mind doing the work.

          Show
          Russell Jurney added a comment - Jon: Should I write e2e tests or unit tests? Not sure how to test this, can you advise? Don't mind doing the work.
          Hide
          Jonathan Coveney added a comment -

          Russell,

          Make sure you try running stuff on the cluster. Otherwise it would have already worked, since it should pick up the gems from your environment.

          As far as testing, that's tough. This is about shipping gems which you have and the server doesn't, but it is awkward to require that someone have gems in order to run the tests. I suppose we could download gems on the client, and then use them in M/R mode. Need to think on it beyond that.

          Rohini, Cheolsoo,

          I've added you as watchers to this not because I like annoying you, but since you guys have a lot more experience testing and keeping builds green than I do, I thought you might have some ideas.

          Show
          Jonathan Coveney added a comment - Russell, Make sure you try running stuff on the cluster. Otherwise it would have already worked, since it should pick up the gems from your environment. As far as testing, that's tough. This is about shipping gems which you have and the server doesn't, but it is awkward to require that someone have gems in order to run the tests. I suppose we could download gems on the client, and then use them in M/R mode. Need to think on it beyond that. Rohini, Cheolsoo, I've added you as watchers to this not because I like annoying you, but since you guys have a lot more experience testing and keeping builds green than I do, I thought you might have some ideas.
          Hide
          Russell Jurney added a comment -

          Before JRuby gems even in local mode in UDFs did not actually work, and now they do, but that could be the effect of me setting $JRUBY_HOME. I'll test on the cluster.

          Show
          Russell Jurney added a comment - Before JRuby gems even in local mode in UDFs did not actually work, and now they do, but that could be the effect of me setting $JRUBY_HOME. I'll test on the cluster.
          Hide
          Cheolsoo Park added a comment -

          Although I am no Ruby expert, I think that Jonathan's patch works well. Here is my test.

          1) installed a non-trivial rubygem library (rubygem-json) on the client only and confirmed that it is not installed on any datanode on the cluster.

          /usr/lib/ruby/gems/1.8/gems/json-1.4.6/
          

          2) wrote a ruby udf that parses json string:

          require 'rubygems'
          require 'pigudf'
          require 'json'
          
          class Myudfs < PigUdf
             outputSchema "result:chararray"
             def parseJson input
                result = JSON.parse(input)
             end
          end
          

          3) wrote a short pig script that loads a jsonstring and calls my ruby udf:

          register 'test.rb' using jruby as myfuncs;
          a = load 'json.txt' using PigStorage() as (i:chararray);
          b = foreach a generate myfuncs.parseJson(i);
          dump b;
          

          4) got the expected result as follows:

          input
          {"id":1,"nested":{"value1":"first1","next":{"complex_record":{"id":2,"nested":{"value1":"second1","next":null,"value2":"second2"}}},"value2":"first2"}}
          
          result
          ([id#1,nested#{value1=first1, value2=first2, next={complex_record={id=2, nested={value1=second1, value2=second2, next=null}}}}])
          

          Without Jonathan's patch, I get the following error in the front-end as expected:

          LoadError: no such file to load -- json
            require at org/jruby/RubyKernel.java:1042
            require at file:/home/cheolsoo/pig-ruby/build/ivy/lib/Pig/jruby-complete-1.6.7.jar!/META-INF/jruby.home/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36
             (root) at test.rb:3
          2012-10-18 17:09:24,323 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. (LoadError) no such file to load -- json
          

          I also ran the "Scripting" e2e test cases with the patch on a Hadoop-1.0.x cluster, and they all passed. So it seems good to commit to me.

          Btw, I wanted to write an e2e test case using rubygems-json, but I realized that rubygems-json is under GPL and can't include in Pig. We should either find another rubygem library that is under the Apache licence or make the test configurable so that it will run only if rubygem-json is installed.

          Thanks!

          Show
          Cheolsoo Park added a comment - Although I am no Ruby expert, I think that Jonathan's patch works well. Here is my test. 1) installed a non-trivial rubygem library (rubygem-json) on the client only and confirmed that it is not installed on any datanode on the cluster. /usr/lib/ruby/gems/1.8/gems/json-1.4.6/ 2) wrote a ruby udf that parses json string: require 'rubygems' require 'pigudf' require 'json' class Myudfs < PigUdf outputSchema "result:chararray" def parseJson input result = JSON.parse(input) end end 3) wrote a short pig script that loads a jsonstring and calls my ruby udf: register 'test.rb' using jruby as myfuncs; a = load 'json.txt' using PigStorage() as (i:chararray); b = foreach a generate myfuncs.parseJson(i); dump b; 4) got the expected result as follows: input { "id" :1, "nested" :{ "value1" : "first1" , "next" :{ "complex_record" :{ "id" :2, "nested" :{ "value1" : "second1" , "next" : null , "value2" : "second2" }}}, "value2" : "first2" }} result ([id#1,nested#{value1=first1, value2=first2, next={complex_record={id=2, nested={value1=second1, value2=second2, next= null }}}}]) Without Jonathan's patch, I get the following error in the front-end as expected: LoadError: no such file to load -- json require at org/jruby/RubyKernel.java:1042 require at file:/home/cheolsoo/pig-ruby/build/ivy/lib/Pig/jruby-complete-1.6.7.jar!/META-INF/jruby.home/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36 (root) at test.rb:3 2012-10-18 17:09:24,323 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. (LoadError) no such file to load -- json I also ran the "Scripting" e2e test cases with the patch on a Hadoop-1.0.x cluster, and they all passed. So it seems good to commit to me. Btw, I wanted to write an e2e test case using rubygems-json, but I realized that rubygems-json is under GPL and can't include in Pig. We should either find another rubygem library that is under the Apache licence or make the test configurable so that it will run only if rubygem-json is installed. Thanks!
          Hide
          Jonathan Coveney added a comment -

          Thanks for taking a look, Cheolsoo!

          A test would be awesome. IMHO we should try and use a non-GPL patch. For a test, there is no reason not to since we control all the variables. In the future if it is unavoidable, we can work around that.

          Show
          Jonathan Coveney added a comment - Thanks for taking a look, Cheolsoo! A test would be awesome. IMHO we should try and use a non-GPL patch. For a test, there is no reason not to since we control all the variables. In the future if it is unavoidable, we can work around that.
          Hide
          Jonathan Coveney added a comment -

          Cheolsoo,

          Was wondering what you think is necessary as far as getting this into 0.11. JRuby support is still evolving/experimental, so I think assuming this feature doesn't blow anything up (it doesn't), then the bar to putting it in doesn't need to be unduly high.

          the e2e test you proposed seems like it'd be easy to do.

          I need to update this patch to use the released version of 1.7.0, but besides that, would love thoughts.

          Jon

          Show
          Jonathan Coveney added a comment - Cheolsoo, Was wondering what you think is necessary as far as getting this into 0.11. JRuby support is still evolving/experimental, so I think assuming this feature doesn't blow anything up (it doesn't), then the bar to putting it in doesn't need to be unduly high. the e2e test you proposed seems like it'd be easy to do. I need to update this patch to use the released version of 1.7.0, but besides that, would love thoughts. Jon
          Hide
          Cheolsoo Park added a comment -

          Hi Jon.

          I am comfortable with your patch. Why don't we open a jira for adding an e2e test case for this feature and let this in 0.11?

          Sounds reasonable? Let me know if anyone has any concerns.

          Thanks!

          Show
          Cheolsoo Park added a comment - Hi Jon. I am comfortable with your patch. Why don't we open a jira for adding an e2e test case for this feature and let this in 0.11? Sounds reasonable? Let me know if anyone has any concerns. Thanks!
          Hide
          Russell Jurney added a comment -

          This sounds great!

          Show
          Russell Jurney added a comment - This sounds great!
          Hide
          Jonathan Coveney added a comment -

          I radically refactored it because I was getting a lot of annoying errors. It could be refactored further (and made a lot cleaner), but would rather have some eyes on it and testing it. It now passes both the unit tests and the e2e tests (though for the e2e tests to pass, JRUBY_HOME has to be set in your environment...).

          Would love thoughts. I focused on trying to get it to work robustly instead of making it beautiful. I can refactor to make it beautiful later

          Show
          Jonathan Coveney added a comment - I radically refactored it because I was getting a lot of annoying errors. It could be refactored further (and made a lot cleaner), but would rather have some eyes on it and testing it. It now passes both the unit tests and the e2e tests (though for the e2e tests to pass, JRUBY_HOME has to be set in your environment...). Would love thoughts. I focused on trying to get it to work robustly instead of making it beautiful. I can refactor to make it beautiful later
          Jonathan Coveney made changes -
          Attachment PIG-2927-4.patch [ 12552141 ]
          Hide
          Julien Le Dem added a comment -

          This will go in the next release as we are stabilizing the 0.11 branch

          Show
          Julien Le Dem added a comment - This will go in the next release as we are stabilizing the 0.11 branch
          Julien Le Dem made changes -
          Fix Version/s 0.12 [ 12323380 ]
          Fix Version/s 0.11 [ 12318878 ]
          Hide
          Dmitriy V. Ryaboy added a comment -

          Jon, can you wrap this one up?

          Show
          Dmitriy V. Ryaboy added a comment - Jon, can you wrap this one up?
          Hide
          Dan Harvey added a comment -

          I've had a look over this and I think I get the general approach. We're going over all the gem's loaded for the ruby file required, and using the locally installed gem to package in the Jar for the UDF to use?

          If so I'm not sure that's the best approach, in Ruby you generally use the Gemfile with bundler to specify where you want to get the dependencies from and what version, then the require in the code to specify what you need to use for that code specifically. By loading them locally we end up having lots of issues with having to have the correct version installed and used by default locally which is not always the case. You also can't lock the versions use in committed code, which you can do with a Gemfile.lock file.

          What I've done to work with ruby UDFs in Pig at the moment is use Warbler to package up a Jar of dependencies from a Gemfile then also require this Jar in the pig script. This way I can quickly change where I pull dependencies from that being local, github, different branches on github, etc...

          https://github.com/EqualMedia/warbler/tree/just_jar_gemfile

          j = Warbler::Jar.new
          j.apply(Warbler::Config.new)
          j.create("gem.jar")
          

          As the changes to using jRuby 1.7 are in here too what I think makes sense is we split out the uploading dependencies from using jRuby 1.7. The 1.7 is probably quicker to get in? then I could maybe look at adding bundler support if we think that's a preferred approach?

          Show
          Dan Harvey added a comment - I've had a look over this and I think I get the general approach. We're going over all the gem's loaded for the ruby file required, and using the locally installed gem to package in the Jar for the UDF to use? If so I'm not sure that's the best approach, in Ruby you generally use the Gemfile with bundler to specify where you want to get the dependencies from and what version , then the require in the code to specify what you need to use for that code specifically. By loading them locally we end up having lots of issues with having to have the correct version installed and used by default locally which is not always the case. You also can't lock the versions use in committed code, which you can do with a Gemfile.lock file. What I've done to work with ruby UDFs in Pig at the moment is use Warbler to package up a Jar of dependencies from a Gemfile then also require this Jar in the pig script. This way I can quickly change where I pull dependencies from that being local, github, different branches on github, etc... https://github.com/EqualMedia/warbler/tree/just_jar_gemfile j = Warbler::Jar. new j.apply(Warbler::Config. new ) j.create( "gem.jar" ) As the changes to using jRuby 1.7 are in here too what I think makes sense is we split out the uploading dependencies from using jRuby 1.7. The 1.7 is probably quicker to get in? then I could maybe look at adding bundler support if we think that's a preferred approach?
          Dan Harvey made changes -
          Link This issue relates to PIG-3298 [ PIG-3298 ]
          Daniel Dai made changes -
          Fix Version/s 0.13.0 [ 12324971 ]
          Fix Version/s 0.12.0 [ 12323380 ]
          Aniket Mokashi made changes -
          Fix Version/s 0.14.0 [ 12326954 ]
          Fix Version/s 0.13.0 [ 12324971 ]

            People

            • Assignee:
              Jonathan Coveney
              Reporter:
              Russell Jurney
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:

                Development