• Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.11
    • Fix Version/s: None
    • Component/s: impl
    • Labels:


      The current EvalFunc interface (and associated Algebraic and Accumulator interfaces) have grown unwieldy. In particular, people have noted the following issues:

      1. Writing a UDF requires a lot of boiler plate code.
      2. Since UDFs always pass a tuple, users are required to manage their own type checking for input.
      3. Declaring schemas for output data is confusing.
      4. Writing a UDF that accepts multiple different parameters (using getArgToFuncMapping) is confusing.
      5. Using Algebraic and Accumulator interfaces often entails duplicating code from the initial implementation.
      6. UDF implementors are exposed to the internals of Pig since they have to know when to return a tuple (Initial, Intermediate) and when not to (exec, Final).
      7. The separation of Initial, Intermediate, and Final into separate classes forces code duplication and makes it hard for UDFs in other languages to use those interfaces.
      8. There is unused code in the current interface that occasionally causes confusion (e.g. isAsynchronous)

      Any change must be done in a way that allows existing UDFs to continue working essentially forever.

      1. examples.patch
        9 kB
        Julien Le Dem
      2. PIG-newudf.patch
        87 kB
        Alan Gates

        Issue Links


          No work has yet been logged on this issue.


            • Assignee:
              Alan Gates
              Alan Gates
            • Votes:
              0 Vote for this issue
              13 Start watching this issue


              • Created: