Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1824

Support import modules in Jython UDF

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.8.0, 0.9.0
    • 0.10.0
    • None
    • None
    • Reviewed
    • Hide
      module import state is determined before and after user code is executed. The resolved modules are inspected and added to the pigContext, then they are added to the job jar.

      this patch addresses the following import modes:
      - import re, which will (if configured) find re on the filesystem in the jython install root
      - import foo (which can import bar), this works now provided bar is resolvable JYTHON_HOME, JYTHONPATH, curdir, etc.
      - from pkg import *, which works when the cachedir is writable
      - import non.jvm.class, which works when the cachedir is writable
      - the directly imported module may use schema decorators, but recursively imported modules cannot until PIG-1943 is addressed
      Show
      module import state is determined before and after user code is executed. The resolved modules are inspected and added to the pigContext, then they are added to the job jar. this patch addresses the following import modes: - import re, which will (if configured) find re on the filesystem in the jython install root - import foo (which can import bar), this works now provided bar is resolvable JYTHON_HOME, JYTHONPATH, curdir, etc. - from pkg import *, which works when the cachedir is writable - import non.jvm.class, which works when the cachedir is writable - the directly imported module may use schema decorators, but recursively imported modules cannot until PIG-1943 is addressed
    • jython, import

    Description

      Currently, Jython UDF script doesn't support Jython import statement as in the following example:

      #!/usr/bin/python
      
      import re
      @outputSchema("word:chararray")
      def resplit(content, regex, index):
              return re.compile(regex).split(content)[index]
      

      Can Pig automatically locate the Jython module file and ship it to the backend? Or should we add a ship clause to let user explicitly specify the module to ship?

      Attachments

        1. 1824_final.patch
          30 kB
          Woody Anderson
        2. 1824.patch
          24 kB
          Woody Anderson
        3. 1824a.patch
          23 kB
          Woody Anderson
        4. 1824b.patch
          28 kB
          Woody Anderson
        5. 1824c.patch
          28 kB
          Woody Anderson
        6. 1824d.patch
          28 kB
          Woody Anderson
        7. 1824x.patch
          30 kB
          Woody Anderson
        8. TEST-org.apache.pig.test.TestGrunt.txt
          1.09 MB
          Alan Gates
        9. TEST-org.apache.pig.test.TestScriptLanguage.txt
          898 kB
          Alan Gates
        10. TEST-org.apache.pig.test.TestScriptUDF.txt
          192 kB
          Alan Gates

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            woody.anderson@gmail.com Woody Anderson
            rding Richard Ding
            Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment