Pig
  1. Pig
  2. PIG-3898

Refactor PPNL for non-MR execution engine

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: tez-branch, 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently, PPNL assumes the MR plan, and thus, it's not compatible with non-MR execution engine. To support non-MR execution engines, I propose we changed initialPlanNotification() method as follows-

      from
      public void initialPlanNotification(String scriptId, MROperPlan plan);
      
      to
      public void initialPlanNotification(String scriptId, OperatorPlan<?> plan);
      

      Since MROperPlan and TezOperPlan are a subclass of OperatorPlan, this method can take both plans. In addition, if we add a new execution engine in the future, it won't break the interface again as long as we build the operator plan as a subclass of OperatorPlan.

      With this approach, applications such as Ambrose / Lipstick should be able to dynamically cast OperatorPlan to a concrete subclass depending on the ExecType.

      One disadvantage is that this isn't backward compatible with Pig 0.12 and older. But it only requires minor changes, and backward compatibility will be broken one time only.

      I also considered an alternative approach, for example, adding a new PPNL for Tez. But this approach has two problems.

      1. Pig registers PPNL via the Main function, and right now, only one PPNL can be registered. So having more than one PPNLs requires quite a few code changes in Main, ScriptState, and so on.
      2. Multiple PPNL interfaces mean multiple PPNL implementations. This results in more (duplicate) code in applications such as Ambrose / Lipstick.
      1. PIG-3898-1-tez.patch
        52 kB
        Cheolsoo Park
      2. PIG-3898-1-trunk.patch
        42 kB
        Cheolsoo Park

        Activity

        Hide
        Prashant Kommireddi added a comment -

        I'm a +1 on the first approach. It's the right thing to do in this case, in terms of method signature having a base Interface rather than an implementation (OperatorPlan vs MROperPlan). PPNL is marked evolving which tells the the compatibility can be broken between minor releases (though not desired). It's unfortunate that we need to be doing it and users need to adjust, but sooner we make the change the better IMHO.

        Let's also send out a note to Pig user mailing list in case we all agree to go ahead with approach 1.

        Show
        Prashant Kommireddi added a comment - I'm a +1 on the first approach. It's the right thing to do in this case, in terms of method signature having a base Interface rather than an implementation (OperatorPlan vs MROperPlan). PPNL is marked evolving which tells the the compatibility can be broken between minor releases (though not desired). It's unfortunate that we need to be doing it and users need to adjust, but sooner we make the change the better IMHO. Let's also send out a note to Pig user mailing list in case we all agree to go ahead with approach 1.
        Show
        Rohini Palaniswamy added a comment - +1 on first approach. Ambrose and Lipstick are the ones affected. Twitter folks already gave their ok in PIG-3419 for incompatibility w.r.t Ambrose. https://issues.apache.org/jira/browse/PIG-3419?focusedCommentId=13753863&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13753863 https://issues.apache.org/jira/browse/PIG-3419?focusedCommentId=13753912&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13753912 Are there any other known usages of PPNL by others?
        Hide
        Prashant Kommireddi added a comment -

        PPNL is a public interface, so we need to assume it's used by folks in ways we aren't aware. We use it at Salesforce. I remember another user having filed a JIRA around an issue with ScriptState that he discovered with the use of PPNL.

        Show
        Prashant Kommireddi added a comment - PPNL is a public interface, so we need to assume it's used by folks in ways we aren't aware. We use it at Salesforce. I remember another user having filed a JIRA around an issue with ScriptState that he discovered with the use of PPNL.
        Hide
        Prashant Kommireddi added a comment -

        Cheolsoo Park Rohini Palaniswamy on second thoughts, do you feel we should use this as an opportunity to not expose OperatorPlan and instead expose a trimmed down version of it? Exposing OperatorPlan itself seems a bit risky (if we ever plan to change things around later).

        Not sure how it affects the downstream users in terms of the extent to which one would need to make changes in PPNL custom impls.

        Show
        Prashant Kommireddi added a comment - Cheolsoo Park Rohini Palaniswamy on second thoughts, do you feel we should use this as an opportunity to not expose OperatorPlan and instead expose a trimmed down version of it? Exposing OperatorPlan itself seems a bit risky (if we ever plan to change things around later). Not sure how it affects the downstream users in terms of the extent to which one would need to make changes in PPNL custom impls.
        Hide
        Cheolsoo Park added a comment -

        Thank you everyone for your comments. I am uploading my wip patches.

        The changes include-

        1. Update initialPlanNotification() in PPNL as suggested.
        2. Move non-MR specific code from MRScriptState to ScriptState. PIG-3419 moved too much code to MRScriptState, so I am moving back to ScriptState whatever is applicable to both MR and Tez.

        Regarding exposing OperatorPlan in the API, I agree that we should avoid it if possible. But since Ambrose and Lipstick use it heavily, it won't be easy to take it back at this point. Nevertheless, it's definitely worth to discuss.

        Show
        Cheolsoo Park added a comment - Thank you everyone for your comments. I am uploading my wip patches. The changes include- Update initialPlanNotification() in PPNL as suggested. Move non-MR specific code from MRScriptState to ScriptState. PIG-3419 moved too much code to MRScriptState, so I am moving back to ScriptState whatever is applicable to both MR and Tez. Regarding exposing OperatorPlan in the API, I agree that we should avoid it if possible. But since Ambrose and Lipstick use it heavily, it won't be easy to take it back at this point. Nevertheless, it's definitely worth to discuss.
        Hide
        Aniket Mokashi added a comment -

        +1 on the change and the patch.

        Show
        Aniket Mokashi added a comment - +1 on the change and the patch.
        Hide
        Cheolsoo Park added a comment -

        Committed to trunk and tez branch.

        Thank you Aniket for the review!

        Show
        Cheolsoo Park added a comment - Committed to trunk and tez branch. Thank you Aniket for the review!

          People

          • Assignee:
            Cheolsoo Park
            Reporter:
            Cheolsoo Park
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development