Pig
  1. Pig
  2. PIG-2745

Pig e2e test RubyUDFs fails in MR mode when running from tarball

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.1
    • Fix Version/s: 0.11, 0.10.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball (not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode Hadoop can be used.

      ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
      

      The test fails with the following error:

      java.lang.IllegalStateException: Could not initialize interpreter (from file system or classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
      

      Looking at the job jar generated by Pig, "scriptingudfs.rb" can be found as follows:

      [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
        2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
      

      Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" is supposed to be read from the job jar, but it is not. The reason is because getResourceAsStream("/x") looks for "x" (without the leading "/") not "/x". Since "scriptingudfs.rb" is stored with it absolute path, it ends up being not found by getResourceAsStream(scriptPath).

      File file = new File(scriptPath);
      if (file.exists()) {
          try {
              is = new FileInputStream(file);
          } catch (FileNotFoundException e) {
              throw new IllegalStateException("could not find existing file "+scriptPath, e);
          }
      } else {
          if (file.isAbsolute()) {
              is = ScriptEngine.class.getResourceAsStream(scriptPath);
          } else {
              is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
          }
      }
      

      In fact, the test passes if you run in local mode or from installed Pig. The reason is because "scriptingudfs.rb" is found in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb).

      The fix seems straightforward. Attached is the patch that removes the leading "/" when registering UDF scripts so that they are stored without the leading "/" in the job jar as follows:

      [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
        2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
      

      Thanks!

      1. PIG-2745.patch
        0.7 kB
        Cheolsoo Park
      2. PIG-2745-2.patch
        0.7 kB
        Cheolsoo Park
      3. Test001.java
        2 kB
        Daniel Dai
      4. enable_scripting_tests_23.patch
        5 kB
        Daniel Dai

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Cheolsoo Park
              Reporter:
              Cheolsoo Park
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development