Pig
  1. Pig
  2. PIG-2620

Customizable Error Handling in Pig

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The current behavior of Pig when handling exceptions thrown by UDFs is to fail and stop processing. We want to extend this behavior to let user have finer grain control on error handling.

      Depending on the use-case there are several options users would like to have:

      Stop the execution and report an error
      Ignore tuples that cause exceptions and log warnings
      Ignore tuples that cause exceptions and redirect them to an error relation (to enable statistics, debugging, ...)
      Write their own error handler

      1. error_flow.png
        84 kB
        Lorand Bendig
      2. rewrite_example.txt
        50 kB
        Lorand Bendig

        Activity

        Lorand Bendig made changes -
        Attachment error_flow.png [ 12639209 ]
        Attachment rewrite_example.txt [ 12639210 ]
        Hide
        Lorand Bendig added a comment -

        Based on the discussion on the wiki page I have further elaborated the implementation details. Two files are attached:

        • a possible rewrite of the ONERROR syntax with explain plans
        • a diagram about the error/ignored result propagation back from the EvalFuncs.

        Some notes:

        The ONERROR syntax has been substitued with existing Pig operators, so that the optimizers/visitors can fully understand/process the logical plan.

        The diagram shows a publish-subscribe communication between EvalFuncs/POUserFuncs and POForeach which enables to report back information at the same time when a null return happens due to an invalid record. These information like thrown exception, detail msg...etc are needed to create the tuple for the error relation.
        Guava's EventBus could be a good candidate for this purpose.

        What do you think?

        Show
        Lorand Bendig added a comment - Based on the discussion on the wiki page I have further elaborated the implementation details. Two files are attached: a possible rewrite of the ONERROR syntax with explain plans a diagram about the error/ignored result propagation back from the EvalFuncs. Some notes: The ONERROR syntax has been substitued with existing Pig operators, so that the optimizers/visitors can fully understand/process the logical plan. The diagram shows a publish-subscribe communication between EvalFuncs/POUserFuncs and POForeach which enables to report back information at the same time when a null return happens due to an invalid record. These information like thrown exception, detail msg...etc are needed to create the tuple for the error relation. Guava's EventBus could be a good candidate for this purpose. What do you think?
        Lorand Bendig made changes -
        Assignee Lorand Bendig [ lbendig ]
        Hide
        Russell Jurney added a comment -

        This ticket is the future of data processing. Who do we have to bribe to get this built?

        Show
        Russell Jurney added a comment - This ticket is the future of data processing. Who do we have to bribe to get this built?
        Dmitriy V. Ryaboy made changes -
        Labels gsoc2012
        Dmitriy V. Ryaboy made changes -
        Field Original Value New Value
        Labels gsoc2012
        Hide
        Dmitriy V. Ryaboy added a comment -

        Note an extensive discussion of this feature and potential approaches to syntax, implementation, and semantics on this wiki page: http://wiki.apache.org/pig/PigErrorHandlingInScripts

        Show
        Dmitriy V. Ryaboy added a comment - Note an extensive discussion of this feature and potential approaches to syntax, implementation, and semantics on this wiki page: http://wiki.apache.org/pig/PigErrorHandlingInScripts
        Dmitriy V. Ryaboy created issue -

          People

          • Assignee:
            Lorand Bendig
            Reporter:
            Dmitriy V. Ryaboy
          • Votes:
            10 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:

              Development