Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1824

Support import modules in Jython UDF

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0, 0.9.0
    • Fix Version/s: 0.10.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      module import state is determined before and after user code is executed. The resolved modules are inspected and added to the pigContext, then they are added to the job jar.

      this patch addresses the following import modes:
      - import re, which will (if configured) find re on the filesystem in the jython install root
      - import foo (which can import bar), this works now provided bar is resolvable JYTHON_HOME, JYTHONPATH, curdir, etc.
      - from pkg import *, which works when the cachedir is writable
      - import non.jvm.class, which works when the cachedir is writable
      - the directly imported module may use schema decorators, but recursively imported modules cannot until PIG-1943 is addressed
      Show
      module import state is determined before and after user code is executed. The resolved modules are inspected and added to the pigContext, then they are added to the job jar. this patch addresses the following import modes: - import re, which will (if configured) find re on the filesystem in the jython install root - import foo (which can import bar), this works now provided bar is resolvable JYTHON_HOME, JYTHONPATH, curdir, etc. - from pkg import *, which works when the cachedir is writable - import non.jvm.class, which works when the cachedir is writable - the directly imported module may use schema decorators, but recursively imported modules cannot until PIG-1943 is addressed
    • Tags:
      jython, import

      Description

      Currently, Jython UDF script doesn't support Jython import statement as in the following example:

      #!/usr/bin/python
      
      import re
      @outputSchema("word:chararray")
      def resplit(content, regex, index):
              return re.compile(regex).split(content)[index]
      

      Can Pig automatically locate the Jython module file and ship it to the backend? Or should we add a ship clause to let user explicitly specify the module to ship?

        Attachments

        1. 1824.patch
          24 kB
          Woody Anderson
        2. 1824a.patch
          23 kB
          Woody Anderson
        3. 1824b.patch
          28 kB
          Woody Anderson
        4. 1824c.patch
          28 kB
          Woody Anderson
        5. 1824d.patch
          28 kB
          Woody Anderson
        6. 1824x.patch
          30 kB
          Woody Anderson
        7. TEST-org.apache.pig.test.TestGrunt.txt
          1.09 MB
          Alan Gates
        8. TEST-org.apache.pig.test.TestScriptLanguage.txt
          898 kB
          Alan Gates
        9. TEST-org.apache.pig.test.TestScriptUDF.txt
          192 kB
          Alan Gates
        10. 1824_final.patch
          30 kB
          Woody Anderson

          Issue Links

            Activity

              People

              • Assignee:
                woody.anderson@gmail.com Woody Anderson
                Reporter:
                rding Richard Ding
              • Votes:
                1 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: