Pig
  1. Pig
  2. PIG-2620

Customizable Error Handling in Pig

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The current behavior of Pig when handling exceptions thrown by UDFs is to fail and stop processing. We want to extend this behavior to let user have finer grain control on error handling.

      Depending on the use-case there are several options users would like to have:

      Stop the execution and report an error
      Ignore tuples that cause exceptions and log warnings
      Ignore tuples that cause exceptions and redirect them to an error relation (to enable statistics, debugging, ...)
      Write their own error handler

      1. error_flow.png
        84 kB
        Lorand Bendig
      2. rewrite_example.txt
        50 kB
        Lorand Bendig

        Activity

        Dmitriy V. Ryaboy created issue -
        Hide
        Dmitriy V. Ryaboy added a comment -

        Note an extensive discussion of this feature and potential approaches to syntax, implementation, and semantics on this wiki page: http://wiki.apache.org/pig/PigErrorHandlingInScripts

        Show
        Dmitriy V. Ryaboy added a comment - Note an extensive discussion of this feature and potential approaches to syntax, implementation, and semantics on this wiki page: http://wiki.apache.org/pig/PigErrorHandlingInScripts
        Dmitriy V. Ryaboy made changes -
        Field Original Value New Value
        Labels gsoc2012
        Dmitriy V. Ryaboy made changes -
        Labels gsoc2012
        Hide
        Russell Jurney added a comment -

        This ticket is the future of data processing. Who do we have to bribe to get this built?

        Show
        Russell Jurney added a comment - This ticket is the future of data processing. Who do we have to bribe to get this built?
        Lorand Bendig made changes -
        Assignee Lorand Bendig [ lbendig ]
        Hide
        Lorand Bendig added a comment -

        Based on the discussion on the wiki page I have further elaborated the implementation details. Two files are attached:

        • a possible rewrite of the ONERROR syntax with explain plans
        • a diagram about the error/ignored result propagation back from the EvalFuncs.

        Some notes:

        The ONERROR syntax has been substitued with existing Pig operators, so that the optimizers/visitors can fully understand/process the logical plan.

        The diagram shows a publish-subscribe communication between EvalFuncs/POUserFuncs and POForeach which enables to report back information at the same time when a null return happens due to an invalid record. These information like thrown exception, detail msg...etc are needed to create the tuple for the error relation.
        Guava's EventBus could be a good candidate for this purpose.

        What do you think?

        Show
        Lorand Bendig added a comment - Based on the discussion on the wiki page I have further elaborated the implementation details. Two files are attached: a possible rewrite of the ONERROR syntax with explain plans a diagram about the error/ignored result propagation back from the EvalFuncs. Some notes: The ONERROR syntax has been substitued with existing Pig operators, so that the optimizers/visitors can fully understand/process the logical plan. The diagram shows a publish-subscribe communication between EvalFuncs/POUserFuncs and POForeach which enables to report back information at the same time when a null return happens due to an invalid record. These information like thrown exception, detail msg...etc are needed to create the tuple for the error relation. Guava's EventBus could be a good candidate for this purpose. What do you think?
        Lorand Bendig made changes -
        Attachment error_flow.png [ 12639209 ]
        Attachment rewrite_example.txt [ 12639210 ]
        Hide
        Qinghao Dai added a comment -

        Is this feature available in 0.8.1 version now?

        Show
        Qinghao Dai added a comment - Is this feature available in 0.8.1 version now?
        Hide
        Dmitriy V. Ryaboy added a comment -

        Hi Qinghao,
        When looking at tickets in JIRA, to find out whether and in what version they are closed out, you want to look at "resolution" and "fix version". In this case, resolution is "unresolved" meaning this work has not been completed.

        If it was "fixed", you'd be able to check if this is in your version by checking "fix version" – if it's a number equal to or lower than what you are running, you have it.

        It's extremely unlikely that this will ever go into 0.8.1 since the current version is 0.13 (about to be released, and also doesn't have this feature – so far this feature is only a design, there's no real code). 0.8.1 is quite old, you really should upgrade....

        Show
        Dmitriy V. Ryaboy added a comment - Hi Qinghao, When looking at tickets in JIRA, to find out whether and in what version they are closed out, you want to look at "resolution" and "fix version". In this case, resolution is "unresolved" meaning this work has not been completed. If it was "fixed", you'd be able to check if this is in your version by checking "fix version" – if it's a number equal to or lower than what you are running, you have it. It's extremely unlikely that this will ever go into 0.8.1 since the current version is 0.13 (about to be released, and also doesn't have this feature – so far this feature is only a design, there's no real code). 0.8.1 is quite old, you really should upgrade....
        Hide
        Qinghao Dai added a comment -

        Hi Dmitriy,
        Thanks for your reply. I am looking for something has similar function like this.
        So, I will keep watching on this issue.

        Show
        Qinghao Dai added a comment - Hi Dmitriy, Thanks for your reply. I am looking for something has similar function like this. So, I will keep watching on this issue.

          People

          • Assignee:
            Lorand Bendig
            Reporter:
            Dmitriy V. Ryaboy
          • Votes:
            11 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Created:
              Updated:

              Development