Pig
  1. Pig
  2. PIG-1942

script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.8.0, 0.9.0
    • Fix Version/s: None
    • Component/s: impl
    • Labels:

      Description

      from https://issues.apache.org/jira/browse/PIG-1824

      import re
      @outputSchema("y:bag{t:tuple(word:chararray)}")
      def strsplittobag(content,regex):
              return re.compile(regex).split(content)
      

      does not work because split returns a list of strings. However, the output schema is known, and it would be quite simple to implicitly promote the string element to a tupled element.
      also, a list/array/tuple/set etc. are all equally convertable to bag, and list/array/tuple are equally convertable to Tuple, this conversion can be done in a much less rigid way with the use of the schema.

      this allows much more facile re-use of existing python code and less memory overhead to create intermediate re-converting of object types.
      I have written the code to do this a while back as part of my version of the jython script framework, i'll isolate that and attach.

      1. 1942.patch
        37 kB
        Woody Anderson
      2. 1942_with_junit.patch
        65 kB
        Woody Anderson

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Woody Anderson
            Reporter:
            Woody Anderson
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:

              Development