Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2101

Registering a Python function in a directory other than the current working directory fails

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.8.1
    • Fix Version/s: None
    • Component/s: impl
    • Labels:
      None

      Description

      In MapReduce mode, if the register command references a directory other than the current one, executing the Python UDF on the backend fails with: Deserialization error: could not instantiate 'org.apache.pig.scripting.jython.JythonFunction' with arguments '[../udfs/python/production.py, production]'

      I assume it is using the path on the backend to try to locate the UDF.

      The script is:

      register '../udfs/python/production.py' using jython as bballudfs;
      players  = load 'baseball' as (name:chararray, team:chararray,
                      pos:bag{t:(p:chararray)}, bat:map[]);
      nonnull  = filter players by bat#'slugging_percentage' is not null and
                      bat#'on_base_percentage' is not null;
      calcprod = foreach nonnull generate name, bballudfs.production(
                      (float)bat#'slugging_percentage',
                      (float)bat#'on_base_percentage');
      dump calcprod;
      

        Issue Links

          Activity

          Hide
          doeklund Daniel Eklund added a comment -

          I can actually still get it to work with a relative directory not involving '..'
          For instance
          Register 'test/simple.py' as myNamespace;

          where test is a subdir in the working directory. But any path with '..' fails.
          Would also be nice to add something in the documentation about NOT using absolute paths.

          Show
          doeklund Daniel Eklund added a comment - I can actually still get it to work with a relative directory not involving '..' For instance Register 'test/simple.py' as myNamespace; where test is a subdir in the working directory. But any path with '..' fails. Would also be nice to add something in the documentation about NOT using absolute paths.
          Hide
          doeklund Daniel Eklund added a comment -

          As per:
          http://mail-archives.apache.org/mod_mbox/pig-user/201106.mbox/browser

          While implicit in the title of this bug, the deserialization on the back-end is failing for any python file located in anything other than a relative directory Registered on the front end.

          The functionality should support all directory references: absolute, parent-relative (i.e. '../..'), and child-relative.

          Show
          doeklund Daniel Eklund added a comment - As per: http://mail-archives.apache.org/mod_mbox/pig-user/201106.mbox/browser While implicit in the title of this bug, the deserialization on the back-end is failing for any python file located in anything other than a relative directory Registered on the front end. The functionality should support all directory references: absolute, parent-relative (i.e. '../..'), and child-relative.

            People

            • Assignee:
              Unassigned
              Reporter:
              alangates Alan Gates
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development