[PIG-1942] script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 0.8.0, 0.9.0
Fix Version/s: None
Component/s: impl
Labels:
- python
- schema
- udf

Description

from https://issues.apache.org/jira/browse/PIG-1824

import re
@outputSchema("y:bag{t:tuple(word:chararray)}")
def strsplittobag(content,regex):
        return re.compile(regex).split(content)

does not work because split returns a list of strings. However, the output schema is known, and it would be quite simple to implicitly promote the string element to a tupled element.
also, a list/array/tuple/set etc. are all equally convertable to bag, and list/array/tuple are equally convertable to Tuple, this conversion can be done in a much less rigid way with the use of the schema.

this allows much more facile re-use of existing python code and less memory overhead to create intermediate re-converting of object types.
I have written the code to do this a while back as part of my version of the jython script framework, i'll isolate that and attach.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

1942_with_junit.patch
03/May/11 18:25
65 kB
Woody Anderson
1942.patch
03/May/11 07:22
37 kB
Woody Anderson

Activity

People

Assignee:: Woody Anderson

Reporter:: Woody Anderson

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 29/Mar/11 18:36

Updated:: 04/Feb/13 17:38