Pig
  1. Pig
  2. PIG-2547

Easier UDFs: Convenient EvalFunc super-classes

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11
    • Component/s: None
    • Labels:
      None
    • Release Note:
      New superclasses to make it easier to implement UDFs.

      Description

      We've got a few abstract extensions of EvalFunc that make life easier. If people are interested we can push said classes into Pig.

      There are 3 classes, each extending the next. Class naming is all TBD.

      • TypedOutputEvalFunc<OUT> - Implements public Schema outputSchema(Schema input) based on the generic type of the subclass. Provides common helper validation functions which increment counters for good and bad Tuple data passed. Useful where the input to be worked on is a tuple of size N or greater.
      • PrimitiveEvalFunc<IN, OUT> - Same as above with helper validation allowing the ability it subclass and just implement public OUT exec(IN input), where IN and OUT are primitives. Useful when the input is a single primitive in position 0 of a tuple.
      • FunctionWrapperEvalFunc - Wraps a Guava Function implementation (http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/base/Function.html) and allows UDFs to be used in Pig scripts like so, where MyFunction is a class that implements Function:
      DEFINE myUdf org.apache.pig.FunctionWrapperEvalFunc('MyFunction')
      
      1. PIG-2547.3.patch
        33 kB
        Bill Graham
      2. PIG_2547.2.patch
        33 kB
        Bill Graham
      3. PIG_2547.1.patch
        38 kB
        Bill Graham

        Activity

        Hide
        Dmitriy V. Ryaboy added a comment -

        This is kind of a group contrib – Jimmy Lin, Self, Bill Graham, possibly even some Kevin Weil code in there.

        Can non-Twitter check this out and comment? We've found this structure quite useful for making our UDFs light-weight. Abstracting common processing functionality into guava Functions lets us share business logic between multiple systems, instead of embedding it inside UDFs. We think it's pretty neat .

        Show
        Dmitriy V. Ryaboy added a comment - This is kind of a group contrib – Jimmy Lin, Self, Bill Graham, possibly even some Kevin Weil code in there. Can non-Twitter check this out and comment? We've found this structure quite useful for making our UDFs light-weight. Abstracting common processing functionality into guava Functions lets us share business logic between multiple systems, instead of embedding it inside UDFs. We think it's pretty neat .
        Hide
        Bill Graham added a comment -

        No uploaded patch yet to look at, still in the process of refactoring.

        Show
        Bill Graham added a comment - No uploaded patch yet to look at, still in the process of refactoring.
        Hide
        Bill Graham added a comment -

        Here's a first pass at a patch. It includes a number of new files:

        src/org/apache/pig/ExceptionalFunction.java
        src/org/apache/pig/Function.java
        src/org/apache/pig/FunctionWrapperEvalFunc.java
        src/org/apache/pig/PrimitiveEvalFunc.java
        src/org/apache/pig/TypedOutputEvalFunc.java
        test/org/apache/pig/TestFunctionWrapperEvalFunc.java
        test/org/apache/pig/TestPrimitiveEvalFunc.java
        test/org/apache/pig/TestTypedOutputEvalFunc.java
        

        The new Pig Function interface was added as a common subinterface to Googles Function and a new Pig ExceptionalFunction.

        I went through some pains to support all three interfaces in FunctionWrapperEvalFunc. (Dmitriy, this is different than our impl, which doesn't include support for Google's Function.)

        Please take a look and let me know what you think.

        Show
        Bill Graham added a comment - Here's a first pass at a patch. It includes a number of new files: src/org/apache/pig/ExceptionalFunction.java src/org/apache/pig/Function.java src/org/apache/pig/FunctionWrapperEvalFunc.java src/org/apache/pig/PrimitiveEvalFunc.java src/org/apache/pig/TypedOutputEvalFunc.java test/org/apache/pig/TestFunctionWrapperEvalFunc.java test/org/apache/pig/TestPrimitiveEvalFunc.java test/org/apache/pig/TestTypedOutputEvalFunc.java The new Pig Function interface was added as a common subinterface to Googles Function and a new Pig ExceptionalFunction . I went through some pains to support all three interfaces in FunctionWrapperEvalFunc . (Dmitriy, this is different than our impl, which doesn't include support for Google's Function .) Please take a look and let me know what you think.
        Hide
        Bill Graham added a comment -

        Attaching a 2nd patch, please disregard the 1st.

        Show
        Bill Graham added a comment - Attaching a 2nd patch, please disregard the 1st.
        Hide
        Dmitriy V. Ryaboy added a comment -

        Bumping for review

        Show
        Dmitriy V. Ryaboy added a comment - Bumping for review
        Hide
        Daniel Dai added a comment -

        Looks good, couple of comments:
        1. TypedOutputEvalFunc: Get output schema from generic type is already part of EvalFunc, besides, EvalFunc.outputSchema can also handle annotation now, do we still need TypedOutputEvalFunc.outputSchema?
        2. FunctionWrapperEvalFunc: better to be in builtin package, so that user can omit package name in Pig script. Also, there seems no coverage for wrapping guava function in TestFunctionWrapperEvalFunc, better to have some.

        Show
        Daniel Dai added a comment - Looks good, couple of comments: 1. TypedOutputEvalFunc: Get output schema from generic type is already part of EvalFunc, besides, EvalFunc.outputSchema can also handle annotation now, do we still need TypedOutputEvalFunc.outputSchema? 2. FunctionWrapperEvalFunc: better to be in builtin package, so that user can omit package name in Pig script. Also, there seems no coverage for wrapping guava function in TestFunctionWrapperEvalFunc, better to have some.
        Hide
        Bill Graham added a comment -

        Thanks Daniel, here's a 3rd parch with your comments incorporated. Note that there is a Guava Function unit test (see IntegerFloatFunction).

        Show
        Bill Graham added a comment - Thanks Daniel, here's a 3rd parch with your comments incorporated. Note that there is a Guava Function unit test (see IntegerFloatFunction).
        Hide
        Daniel Dai added a comment -

        +1, please commit to trunk.

        Show
        Daniel Dai added a comment - +1, please commit to trunk.
        Hide
        Bill Graham added a comment -

        Committed

        Show
        Bill Graham added a comment - Committed

          People

          • Assignee:
            Bill Graham
            Reporter:
            Bill Graham
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development