Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1613

Explain how different UDF interfaces are used

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.7.0
    • 0.8.0
    • documentation
    • None
    • Hide
      I think this should go into Advanced Topics in the UDF manual

      There are multiple ways for a UDF to be invoked. The simplest UDF can just extend EvalFunc that requires only exec function to be implemented as described in the How to Write a Simple Eval Function section. Every eval UDF must implement this. Additionally, if a function is algebraic, it can implement Algebraic interface to significantly improve query performance in the cases when combiner can be used. The Aggregate Functions section covers this topic in detail. Finally, a function that can process tuples in the incremental fashion can also implement Accumulator interface to improve query memory consumption. Accumulator interface section explains this interface.

      The exact method by which UDF is invoked is selected by the optimizer based on the UDF type and the query. Note that only a single interface is used at any given time. The optimizer tries to find the most efficient way to execute the function. If a combiner is used and function implements Algebraic interface then this interface will be used to invoke the function. If the combiner is not invoked but accumulator can be used and the function implements Accumulator interface then that interface is used. If neither of the conditions is satisfied then exec function is used to invoke the UDF.
      Show
      I think this should go into Advanced Topics in the UDF manual There are multiple ways for a UDF to be invoked. The simplest UDF can just extend EvalFunc that requires only exec function to be implemented as described in the How to Write a Simple Eval Function section. Every eval UDF must implement this. Additionally, if a function is algebraic, it can implement Algebraic interface to significantly improve query performance in the cases when combiner can be used. The Aggregate Functions section covers this topic in detail. Finally, a function that can process tuples in the incremental fashion can also implement Accumulator interface to improve query memory consumption. Accumulator interface section explains this interface. The exact method by which UDF is invoked is selected by the optimizer based on the UDF type and the query. Note that only a single interface is used at any given time. The optimizer tries to find the most efficient way to execute the function. If a combiner is used and function implements Algebraic interface then this interface will be used to invoke the function. If the combiner is not invoked but accumulator can be used and the function implements Accumulator interface then that interface is used. If neither of the conditions is satisfied then exec function is used to invoke the UDF.

    Description

      The current documentation describes individual UDF interfaces such as Algebraic and Accumulator but not their precedence or how they interact with each other and why you might want to implement several of them.

      Corrine, I will add release notes to this JIRA shortly. Don't worry about it till then.

      Attachments

        Activity

          People

            chandec Corinne Chandel
            olgan Olga Natkovich
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: