Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1942

script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects



    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.8.0, 0.9.0
    • Fix Version/s: None
    • Component/s: impl
    • Labels:


      from https://issues.apache.org/jira/browse/PIG-1824

      import re
      def strsplittobag(content,regex):
              return re.compile(regex).split(content)

      does not work because split returns a list of strings. However, the output schema is known, and it would be quite simple to implicitly promote the string element to a tupled element.
      also, a list/array/tuple/set etc. are all equally convertable to bag, and list/array/tuple are equally convertable to Tuple, this conversion can be done in a much less rigid way with the use of the schema.

      this allows much more facile re-use of existing python code and less memory overhead to create intermediate re-converting of object types.
      I have written the code to do this a while back as part of my version of the jython script framework, i'll isolate that and attach.


        1. 1942.patch
          37 kB
          Woody Anderson
        2. 1942_with_junit.patch
          65 kB
          Woody Anderson



            • Assignee:
              woody.anderson@gmail.com Woody Anderson
              woody.anderson@gmail.com Woody Anderson
            • Votes:
              0 Vote for this issue
              7 Start watching this issue


              • Created: