Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.11
-
None
-
None
Description
The current EvalFunc interface (and associated Algebraic and Accumulator interfaces) have grown unwieldy. In particular, people have noted the following issues:
- Writing a UDF requires a lot of boiler plate code.
- Since UDFs always pass a tuple, users are required to manage their own type checking for input.
- Declaring schemas for output data is confusing.
- Writing a UDF that accepts multiple different parameters (using getArgToFuncMapping) is confusing.
- Using Algebraic and Accumulator interfaces often entails duplicating code from the initial implementation.
- UDF implementors are exposed to the internals of Pig since they have to know when to return a tuple (Initial, Intermediate) and when not to (exec, Final).
- The separation of Initial, Intermediate, and Final into separate classes forces code duplication and makes it hard for UDFs in other languages to use those interfaces.
- There is unused code in the current interface that occasionally causes confusion (e.g. isAsynchronous)
Any change must be done in a way that allows existing UDFs to continue working essentially forever.
Attachments
Attachments
Issue Links
- is related to
-
PIG-2430 An EvalFunc which overrides getArgToFuncMapping with FuncSpec with constructor arguments is not properly instantiated with said arguments
- Closed
-
PIG-2344 UDF / LoadFunc / StoreFunc should be serializable
- Open
- relates to
-
PIG-2699 Reduce the number of instances of Load and Store Funcs down to 2+1. It should be 1 in the front-end and 1 in the backend
- Closed