Pig
  1. Pig
  2. PIG-1824

Support import modules in Jython UDF

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0, 0.9.0
    • Fix Version/s: 0.10.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      module import state is determined before and after user code is executed. The resolved modules are inspected and added to the pigContext, then they are added to the job jar.

      this patch addresses the following import modes:
      - import re, which will (if configured) find re on the filesystem in the jython install root
      - import foo (which can import bar), this works now provided bar is resolvable JYTHON_HOME, JYTHONPATH, curdir, etc.
      - from pkg import *, which works when the cachedir is writable
      - import non.jvm.class, which works when the cachedir is writable
      - the directly imported module may use schema decorators, but recursively imported modules cannot until PIG-1943 is addressed
      Show
      module import state is determined before and after user code is executed. The resolved modules are inspected and added to the pigContext, then they are added to the job jar. this patch addresses the following import modes: - import re, which will (if configured) find re on the filesystem in the jython install root - import foo (which can import bar), this works now provided bar is resolvable JYTHON_HOME, JYTHONPATH, curdir, etc. - from pkg import *, which works when the cachedir is writable - import non.jvm.class, which works when the cachedir is writable - the directly imported module may use schema decorators, but recursively imported modules cannot until PIG-1943 is addressed
    • Tags:
      jython, import

      Description

      Currently, Jython UDF script doesn't support Jython import statement as in the following example:

      #!/usr/bin/python
      
      import re
      @outputSchema("word:chararray")
      def resplit(content, regex, index):
              return re.compile(regex).split(content)[index]
      

      Can Pig automatically locate the Jython module file and ship it to the backend? Or should we add a ship clause to let user explicitly specify the module to ship?

      1. 1824_final.patch
        30 kB
        Woody Anderson
      2. 1824.patch
        24 kB
        Woody Anderson
      3. 1824a.patch
        23 kB
        Woody Anderson
      4. 1824b.patch
        28 kB
        Woody Anderson
      5. 1824c.patch
        28 kB
        Woody Anderson
      6. 1824d.patch
        28 kB
        Woody Anderson
      7. 1824x.patch
        30 kB
        Woody Anderson
      8. TEST-org.apache.pig.test.TestGrunt.txt
        1.09 MB
        Alan Gates
      9. TEST-org.apache.pig.test.TestScriptLanguage.txt
        898 kB
        Alan Gates
      10. TEST-org.apache.pig.test.TestScriptUDF.txt
        192 kB
        Alan Gates

        Issue Links

          Activity

            People

            • Assignee:
              Woody Anderson
              Reporter:
              Richard Ding
            • Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development