Pig
  1. Pig
  2. PIG-3430

Add xml format for explaining MapReduce Plan.

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Pig now supports printing out the MapReduce plan in an xml format. To run:

      pig -e 'explain -xml -script <pigscript>'
      Show
      Pig now supports printing out the MapReduce plan in an xml format. To run: pig -e 'explain -xml -script <pigscript>'

      Description

      At Mortar we needed an easy way to store/parse a script's map reduce plan. We added an xml output format for the MapReduce plan to make this easier. We also added a flag to keep track of if each store or load was from the original script (and associated with an alias) or if its a temporary store/load generated by Pig.

      1. PIG-3430.patch
        34 kB
        Jeremy Karn
      2. PIG-3430-2.patch
        35 kB
        Jeremy Karn
      3. PIG-3430-3.patch
        36 kB
        Jeremy Karn
      4. PIG-3430-4.patch
        36 kB
        Jeremy Karn

        Activity

        Hide
        Daniel Dai added a comment -

        I can get xml mapreduce plan with the patch. Two questions:
        1. Any reason we only do it in mapreduce plan?
        2. Why we need to mark tmpLoader? Is it to support xml mapreduce plan? Or it is a separate thing?

        Show
        Daniel Dai added a comment - I can get xml mapreduce plan with the patch. Two questions: 1. Any reason we only do it in mapreduce plan? 2. Why we need to mark tmpLoader? Is it to support xml mapreduce plan? Or it is a separate thing?
        Hide
        Jeremy Karn added a comment -

        1. I only added support for the mapreduce plan since that was all that we needed. If there is demand for the other plans I can add that, although I'd prefer to do that in a separate jira since I might not get around to it for a couple of weeks.

        2. The tmpLoader isn't strictly necessary for the xml format but its a related code change. We use the xml output of the map reduce plan to perform validation on input and output data locations. We wanted to be able to differentiate between a load of user generated source data from a load of temporary data generated from a previous job.

        Show
        Jeremy Karn added a comment - 1. I only added support for the mapreduce plan since that was all that we needed. If there is demand for the other plans I can add that, although I'd prefer to do that in a separate jira since I might not get around to it for a couple of weeks. 2. The tmpLoader isn't strictly necessary for the xml format but its a related code change. We use the xml output of the map reduce plan to perform validation on input and output data locations. We wanted to be able to differentiate between a load of user generated source data from a load of temporary data generated from a previous job.
        Hide
        Jeremy Karn added a comment -

        New patch that applies cleanly after PIG-3419.

        Show
        Jeremy Karn added a comment - New patch that applies cleanly after PIG-3419 .
        Hide
        Jeremy Karn added a comment -

        Fix two small bugs:

        • Not closing the plan tag when the logical plan is empty
        • The MRExecutionEngine was closing the output streams, even when it wasn't the one that opened them, which meant the closing plan tag wasn't being written out.
        Show
        Jeremy Karn added a comment - Fix two small bugs: Not closing the plan tag when the logical plan is empty The MRExecutionEngine was closing the output streams, even when it wasn't the one that opened them, which meant the closing plan tag wasn't being written out.
        Hide
        Daniel Dai added a comment -

        Thanks Jeremy. I am not worrying about not supporting logical plan/physical plan, I just want to make sure it is pure business reason.

        +1. I will commit the patch shortly. Can you also put the release notes in the ticket?

        Show
        Daniel Dai added a comment - Thanks Jeremy. I am not worrying about not supporting logical plan/physical plan, I just want to make sure it is pure business reason. +1. I will commit the patch shortly. Can you also put the release notes in the ticket?
        Hide
        Daniel Dai added a comment -

        The line:
        pig.explain("e", "xml", true, false, ps, ps, ps);
        does not compile, should change to:
        pig.explain("e", "xml", true, false, ps, ps, null, null);

        Show
        Daniel Dai added a comment - The line: pig.explain("e", "xml", true, false, ps, ps, ps); does not compile, should change to: pig.explain("e", "xml", true, false, ps, ps, null, null);
        Hide
        Jeremy Karn added a comment -

        Fixed.

        Show
        Jeremy Karn added a comment - Fixed.
        Hide
        Daniel Dai added a comment -

        Patch committed to trunk. Thanks Jeremy!

        Show
        Daniel Dai added a comment - Patch committed to trunk. Thanks Jeremy!

          People

          • Assignee:
            Jeremy Karn
            Reporter:
            Jeremy Karn
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development