Pig
  1. Pig
  2. PIG-2547

Easier UDFs: Convenient EvalFunc super-classes

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11
    • Component/s: None
    • Labels:
      None
    • Release Note:
      New superclasses to make it easier to implement UDFs.

      Description

      We've got a few abstract extensions of EvalFunc that make life easier. If people are interested we can push said classes into Pig.

      There are 3 classes, each extending the next. Class naming is all TBD.

      • TypedOutputEvalFunc<OUT> - Implements public Schema outputSchema(Schema input) based on the generic type of the subclass. Provides common helper validation functions which increment counters for good and bad Tuple data passed. Useful where the input to be worked on is a tuple of size N or greater.
      • PrimitiveEvalFunc<IN, OUT> - Same as above with helper validation allowing the ability it subclass and just implement public OUT exec(IN input), where IN and OUT are primitives. Useful when the input is a single primitive in position 0 of a tuple.
      • FunctionWrapperEvalFunc - Wraps a Guava Function implementation (http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/base/Function.html) and allows UDFs to be used in Pig scripts like so, where MyFunction is a class that implements Function:
      DEFINE myUdf org.apache.pig.FunctionWrapperEvalFunc('MyFunction')
      
      1. PIG_2547.1.patch
        38 kB
        Bill Graham
      2. PIG_2547.2.patch
        33 kB
        Bill Graham
      3. PIG-2547.3.patch
        33 kB
        Bill Graham

        Activity

        Bill Graham created issue -
        Bill Graham made changes -
        Field Original Value New Value
        Description We've got a few abstract extensions of EvalFunc that make life easier. If people are interested we can push said classes into Pig.

        There are classes, each extending the next. Class naming is all TBD.

        * {{TypedOutputEvalFunc<OUT>}} - Implements {{public Schema outputSchema(Schema input)}} based on the generic type of the subclass. Provides common helper validation functions which increment counters for good and bad Tuple data passed. Useful where the input to be worked on is a tuple of size N or greater.
        * {{PrimitiveEvalFunc<IN, OUT>}} - Same as above with helper validation allowing the ability it subclass and just implement {{public OUT exec(IN input)}}, where IN and OUT are primitives. Useful when the input is a single primitive in position 0 of a tuple.
        * {{FunctionWrapperEvalFunc}} - Wraps a Guava Function implementation (http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/base/Function.html) and allows UDFs to be used in Pig scripts like so, where {{MyFunction}} is a class that implements {{Function}}:

        {noformat}
        DEFINE myUdf org.apache.pig.FunctionWrapperEvalFunc('MyFunction')
        {noformat}
         

        We've got a few abstract extensions of EvalFunc that make life easier. If people are interested we can push said classes into Pig.

        There are 3 classes, each extending the next. Class naming is all TBD.

        * {{TypedOutputEvalFunc<OUT>}} - Implements {{public Schema outputSchema(Schema input)}} based on the generic type of the subclass. Provides common helper validation functions which increment counters for good and bad Tuple data passed. Useful where the input to be worked on is a tuple of size N or greater.
        * {{PrimitiveEvalFunc<IN, OUT>}} - Same as above with helper validation allowing the ability it subclass and just implement {{public OUT exec(IN input)}}, where IN and OUT are primitives. Useful when the input is a single primitive in position 0 of a tuple.
        * {{FunctionWrapperEvalFunc}} - Wraps a Guava Function implementation (http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/base/Function.html) and allows UDFs to be used in Pig scripts like so, where {{MyFunction}} is a class that implements {{Function}}:

        {noformat}
        DEFINE myUdf org.apache.pig.FunctionWrapperEvalFunc('MyFunction')
        {noformat}
         

        Hide
        Dmitriy V. Ryaboy added a comment -

        This is kind of a group contrib – Jimmy Lin, Self, Bill Graham, possibly even some Kevin Weil code in there.

        Can non-Twitter check this out and comment? We've found this structure quite useful for making our UDFs light-weight. Abstracting common processing functionality into guava Functions lets us share business logic between multiple systems, instead of embedding it inside UDFs. We think it's pretty neat .

        Show
        Dmitriy V. Ryaboy added a comment - This is kind of a group contrib – Jimmy Lin, Self, Bill Graham, possibly even some Kevin Weil code in there. Can non-Twitter check this out and comment? We've found this structure quite useful for making our UDFs light-weight. Abstracting common processing functionality into guava Functions lets us share business logic between multiple systems, instead of embedding it inside UDFs. We think it's pretty neat .
        Hide
        Bill Graham added a comment -

        No uploaded patch yet to look at, still in the process of refactoring.

        Show
        Bill Graham added a comment - No uploaded patch yet to look at, still in the process of refactoring.
        Hide
        Bill Graham added a comment -

        Here's a first pass at a patch. It includes a number of new files:

        src/org/apache/pig/ExceptionalFunction.java
        src/org/apache/pig/Function.java
        src/org/apache/pig/FunctionWrapperEvalFunc.java
        src/org/apache/pig/PrimitiveEvalFunc.java
        src/org/apache/pig/TypedOutputEvalFunc.java
        test/org/apache/pig/TestFunctionWrapperEvalFunc.java
        test/org/apache/pig/TestPrimitiveEvalFunc.java
        test/org/apache/pig/TestTypedOutputEvalFunc.java
        

        The new Pig Function interface was added as a common subinterface to Googles Function and a new Pig ExceptionalFunction.

        I went through some pains to support all three interfaces in FunctionWrapperEvalFunc. (Dmitriy, this is different than our impl, which doesn't include support for Google's Function.)

        Please take a look and let me know what you think.

        Show
        Bill Graham added a comment - Here's a first pass at a patch. It includes a number of new files: src/org/apache/pig/ExceptionalFunction.java src/org/apache/pig/Function.java src/org/apache/pig/FunctionWrapperEvalFunc.java src/org/apache/pig/PrimitiveEvalFunc.java src/org/apache/pig/TypedOutputEvalFunc.java test/org/apache/pig/TestFunctionWrapperEvalFunc.java test/org/apache/pig/TestPrimitiveEvalFunc.java test/org/apache/pig/TestTypedOutputEvalFunc.java The new Pig Function interface was added as a common subinterface to Googles Function and a new Pig ExceptionalFunction . I went through some pains to support all three interfaces in FunctionWrapperEvalFunc . (Dmitriy, this is different than our impl, which doesn't include support for Google's Function .) Please take a look and let me know what you think.
        Bill Graham made changes -
        Attachment PIG_2547.1.patch [ 12515676 ]
        Bill Graham made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Bill Graham added a comment -

        Attaching a 2nd patch, please disregard the 1st.

        Show
        Bill Graham added a comment - Attaching a 2nd patch, please disregard the 1st.
        Bill Graham made changes -
        Attachment PIG_2547.2.patch [ 12516197 ]
        Jonathan Coveney made changes -
        Fix Version/s 0.10 [ 12316246 ]
        Fix Version/s 0.11 [ 12318878 ]
        Jonathan Coveney made changes -
        Labels 0.10_blocker
        Hide
        Dmitriy V. Ryaboy added a comment -

        Bumping for review

        Show
        Dmitriy V. Ryaboy added a comment - Bumping for review
        Hide
        Daniel Dai added a comment -

        Looks good, couple of comments:
        1. TypedOutputEvalFunc: Get output schema from generic type is already part of EvalFunc, besides, EvalFunc.outputSchema can also handle annotation now, do we still need TypedOutputEvalFunc.outputSchema?
        2. FunctionWrapperEvalFunc: better to be in builtin package, so that user can omit package name in Pig script. Also, there seems no coverage for wrapping guava function in TestFunctionWrapperEvalFunc, better to have some.

        Show
        Daniel Dai added a comment - Looks good, couple of comments: 1. TypedOutputEvalFunc: Get output schema from generic type is already part of EvalFunc, besides, EvalFunc.outputSchema can also handle annotation now, do we still need TypedOutputEvalFunc.outputSchema? 2. FunctionWrapperEvalFunc: better to be in builtin package, so that user can omit package name in Pig script. Also, there seems no coverage for wrapping guava function in TestFunctionWrapperEvalFunc, better to have some.
        Daniel Dai made changes -
        Fix Version/s 0.10.1 [ 12320547 ]
        Fix Version/s 0.10.0 [ 12316246 ]
        Bill Graham made changes -
        Labels 0.10_blocker
        Hide
        Bill Graham added a comment -

        Thanks Daniel, here's a 3rd parch with your comments incorporated. Note that there is a Guava Function unit test (see IntegerFloatFunction).

        Show
        Bill Graham added a comment - Thanks Daniel, here's a 3rd parch with your comments incorporated. Note that there is a Guava Function unit test (see IntegerFloatFunction).
        Bill Graham made changes -
        Attachment PIG-2547.3.patch [ 12526579 ]
        Bill Graham made changes -
        Fix Version/s 0.10.1 [ 12320547 ]
        Hide
        Daniel Dai added a comment -

        +1, please commit to trunk.

        Show
        Daniel Dai added a comment - +1, please commit to trunk.
        Hide
        Bill Graham added a comment -

        Committed

        Show
        Bill Graham added a comment - Committed
        Bill Graham made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Release Note New superclasses to make it easier to implement UDFs.
        Resolution Fixed [ 1 ]
        Bill Graham made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        1d 2h 53m 1 Bill Graham 23/Feb/12 01:56
        Patch Available Patch Available Resolved Resolved
        78d 23h 50m 1 Bill Graham 12/May/12 02:46
        Resolved Resolved Closed Closed
        286d 3h 7m 1 Bill Graham 22/Feb/13 04:54

          People

          • Assignee:
            Bill Graham
            Reporter:
            Bill Graham
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development