Type: New Feature
Affects Version/s: 0.11
Fix Version/s: None
The current EvalFunc interface (and associated Algebraic and Accumulator interfaces) have grown unwieldy. In particular, people have noted the following issues:
- Writing a UDF requires a lot of boiler plate code.
- Since UDFs always pass a tuple, users are required to manage their own type checking for input.
- Declaring schemas for output data is confusing.
- Writing a UDF that accepts multiple different parameters (using getArgToFuncMapping) is confusing.
- Using Algebraic and Accumulator interfaces often entails duplicating code from the initial implementation.
- UDF implementors are exposed to the internals of Pig since they have to know when to return a tuple (Initial, Intermediate) and when not to (exec, Final).
- The separation of Initial, Intermediate, and Final into separate classes forces code duplication and makes it hard for UDFs in other languages to use those interfaces.
- There is unused code in the current interface that occasionally causes confusion (e.g. isAsynchronous)
Any change must be done in a way that allows existing UDFs to continue working essentially forever.