• Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.11
    • Fix Version/s: None
    • Component/s: impl
    • Labels:


      The current EvalFunc interface (and associated Algebraic and Accumulator interfaces) have grown unwieldy. In particular, people have noted the following issues:

      1. Writing a UDF requires a lot of boiler plate code.
      2. Since UDFs always pass a tuple, users are required to manage their own type checking for input.
      3. Declaring schemas for output data is confusing.
      4. Writing a UDF that accepts multiple different parameters (using getArgToFuncMapping) is confusing.
      5. Using Algebraic and Accumulator interfaces often entails duplicating code from the initial implementation.
      6. UDF implementors are exposed to the internals of Pig since they have to know when to return a tuple (Initial, Intermediate) and when not to (exec, Final).
      7. The separation of Initial, Intermediate, and Final into separate classes forces code duplication and makes it hard for UDFs in other languages to use those interfaces.
      8. There is unused code in the current interface that occasionally causes confusion (e.g. isAsynchronous)

      Any change must be done in a way that allows existing UDFs to continue working essentially forever.


        1. examples.patch
          9 kB
          Julien Le Dem
        2. PIG-newudf.patch
          87 kB
          Alan Gates

          Issue Links



              • Assignee:
                alangates Alan Gates
                alangates Alan Gates
              • Votes:
                0 Vote for this issue
                13 Start watching this issue


                • Created: