Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1942

script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 0.8.0, 0.9.0
    • None
    • impl

    Description

      from https://issues.apache.org/jira/browse/PIG-1824

      import re
      @outputSchema("y:bag{t:tuple(word:chararray)}")
      def strsplittobag(content,regex):
              return re.compile(regex).split(content)
      

      does not work because split returns a list of strings. However, the output schema is known, and it would be quite simple to implicitly promote the string element to a tupled element.
      also, a list/array/tuple/set etc. are all equally convertable to bag, and list/array/tuple are equally convertable to Tuple, this conversion can be done in a much less rigid way with the use of the schema.

      this allows much more facile re-use of existing python code and less memory overhead to create intermediate re-converting of object types.
      I have written the code to do this a while back as part of my version of the jython script framework, i'll isolate that and attach.

      Attachments

        1. 1942.patch
          37 kB
          Woody Anderson
        2. 1942_with_junit.patch
          65 kB
          Woody Anderson

        Activity

          People

            woody.anderson@gmail.com Woody Anderson
            woody.anderson@gmail.com Woody Anderson
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: