Pig
  1. Pig
  2. PIG-3430

Add xml format for explaining MapReduce Plan.

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Pig now supports printing out the MapReduce plan in an xml format. To run:

      pig -e 'explain -xml -script <pigscript>'
      Show
      Pig now supports printing out the MapReduce plan in an xml format. To run: pig -e 'explain -xml -script <pigscript>'

      Description

      At Mortar we needed an easy way to store/parse a script's map reduce plan. We added an xml output format for the MapReduce plan to make this easier. We also added a flag to keep track of if each store or load was from the original script (and associated with an alias) or if its a temporary store/load generated by Pig.

      1. PIG-3430-4.patch
        36 kB
        Jeremy Karn
      2. PIG-3430-3.patch
        36 kB
        Jeremy Karn
      3. PIG-3430-2.patch
        35 kB
        Jeremy Karn
      4. PIG-3430.patch
        34 kB
        Jeremy Karn

        Activity

        Jeremy Karn created issue -
        Jeremy Karn made changes -
        Field Original Value New Value
        Attachment PIG-3430.patch [ 12598742 ]
        Jeremy Karn made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Daniel Dai added a comment -

        I can get xml mapreduce plan with the patch. Two questions:
        1. Any reason we only do it in mapreduce plan?
        2. Why we need to mark tmpLoader? Is it to support xml mapreduce plan? Or it is a separate thing?

        Show
        Daniel Dai added a comment - I can get xml mapreduce plan with the patch. Two questions: 1. Any reason we only do it in mapreduce plan? 2. Why we need to mark tmpLoader? Is it to support xml mapreduce plan? Or it is a separate thing?
        Hide
        Jeremy Karn added a comment -

        1. I only added support for the mapreduce plan since that was all that we needed. If there is demand for the other plans I can add that, although I'd prefer to do that in a separate jira since I might not get around to it for a couple of weeks.

        2. The tmpLoader isn't strictly necessary for the xml format but its a related code change. We use the xml output of the map reduce plan to perform validation on input and output data locations. We wanted to be able to differentiate between a load of user generated source data from a load of temporary data generated from a previous job.

        Show
        Jeremy Karn added a comment - 1. I only added support for the mapreduce plan since that was all that we needed. If there is demand for the other plans I can add that, although I'd prefer to do that in a separate jira since I might not get around to it for a couple of weeks. 2. The tmpLoader isn't strictly necessary for the xml format but its a related code change. We use the xml output of the map reduce plan to perform validation on input and output data locations. We wanted to be able to differentiate between a load of user generated source data from a load of temporary data generated from a previous job.
        Jeremy Karn made changes -
        Fix Version/s 0.12 [ 12323380 ]
        Jeremy Karn made changes -
        Attachment PIG-3430-2.patch [ 12601415 ]
        Hide
        Jeremy Karn added a comment -

        New patch that applies cleanly after PIG-3419.

        Show
        Jeremy Karn added a comment - New patch that applies cleanly after PIG-3419 .
        Jeremy Karn made changes -
        Attachment PIG-3430-3.patch [ 12601451 ]
        Hide
        Jeremy Karn added a comment -

        Fix two small bugs:

        • Not closing the plan tag when the logical plan is empty
        • The MRExecutionEngine was closing the output streams, even when it wasn't the one that opened them, which meant the closing plan tag wasn't being written out.
        Show
        Jeremy Karn added a comment - Fix two small bugs: Not closing the plan tag when the logical plan is empty The MRExecutionEngine was closing the output streams, even when it wasn't the one that opened them, which meant the closing plan tag wasn't being written out.
        Hide
        Daniel Dai added a comment -

        Thanks Jeremy. I am not worrying about not supporting logical plan/physical plan, I just want to make sure it is pure business reason.

        +1. I will commit the patch shortly. Can you also put the release notes in the ticket?

        Show
        Daniel Dai added a comment - Thanks Jeremy. I am not worrying about not supporting logical plan/physical plan, I just want to make sure it is pure business reason. +1. I will commit the patch shortly. Can you also put the release notes in the ticket?
        Jeremy Karn made changes -
        Release Note Pig now supports printing out the MapReduce plan in an xml format. To run:

        pig -e 'explain -xml -script <pigscript>'
        Hide
        Daniel Dai added a comment -

        The line:
        pig.explain("e", "xml", true, false, ps, ps, ps);
        does not compile, should change to:
        pig.explain("e", "xml", true, false, ps, ps, null, null);

        Show
        Daniel Dai added a comment - The line: pig.explain("e", "xml", true, false, ps, ps, ps); does not compile, should change to: pig.explain("e", "xml", true, false, ps, ps, null, null);
        Jeremy Karn made changes -
        Attachment PIG-3430-4.patch [ 12601985 ]
        Hide
        Jeremy Karn added a comment -

        Fixed.

        Show
        Jeremy Karn added a comment - Fixed.
        Hide
        Daniel Dai added a comment -

        Patch committed to trunk. Thanks Jeremy!

        Show
        Daniel Dai added a comment - Patch committed to trunk. Thanks Jeremy!
        Daniel Dai made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Assignee Jeremy Karn [ jeremykarn ]
        Resolution Fixed [ 1 ]
        Daniel Dai made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        37s 1 Jeremy Karn 19/Aug/13 12:51
        Patch Available Patch Available Resolved Resolved
        21d 5h 45m 1 Daniel Dai 09/Sep/13 18:37
        Resolved Resolved Closed Closed
        34d 23h 9m 1 Daniel Dai 14/Oct/13 17:46

          People

          • Assignee:
            Jeremy Karn
            Reporter:
            Jeremy Karn
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development