Pig
  1. Pig
  2. PIG-113

Make Grunt's explain output more understandable

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.1.0
    • Fix Version/s: 0.1.0
    • Component/s: grunt
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      I think it would be better if we can display the execution plan in a more understandable way. One intuitive way to do this is to show output as a tree like in SQL Server.

      Possibly we can have 'AS <format>' as optional argument for explain command

      For example

      Grunt> explain bag1 AS tree ;
      Grunt> explain bag1 AS xml ;
      

      and

      Grunt> explain bag1   
      

      will display the default format

      I have included a patch that does generate tree output.

      Here is a sample of the existing output format

      Logical Plan:
      Group root-Sun Feb 17 19:37:07 GMT+10:00 2008-5
      Object id: 9814147
      Inputs: 26335425 
      Schema: (group, (sum, (), (), ()))
      EvalSpecs:
              Generate: has 2 children
                      Project: (0)
                      Star
      Split root-Sun Feb 17 19:37:07 GMT+10:00 2008-2
      Object id: 25199001
      Inputs: 29132923 
      Schema: (sum, (), (), ())
      EvalSpecs:
      Eval root-Sun Feb 17 19:37:07 GMT+10:00 2008-1
      Object id: 29132923
      Inputs: 10774273 
      Schema: (sum, (), (), ())
      EvalSpecs:
              Generate: has 4 children
                      FuncEval: name: org.apache.pig.impl.builtin.ADD args:
                              Generate: has 2 children
                                      Project: (0)
                                      Project: (1)
                      Project: (0)
                      Project: (1)
                      Project: (2)
      Load root-Sun Feb 17 19:37:07 GMT+10:00 2008-0
      Object id: 10774273
      Inputs: 
      Schema: ()
      EvalSpecs:
      -----------------------------------------------
      Physical Plan:
      MAPREDUCE
      Object id: 17671659
      Inputs: 682933706
      Map: 
              Star
      Grouping Funcs: 
              Generate: has 2 children
                      Project: (0)
                      Star
      Input Files: /tmp/temp678140026/tmp1867058340
      MAPREDUCE
      Object id: 17308974
      Inputs: 
      Map: 
              Composite: has 2 children
                      Star
                      Generate: has 4 children
                              FuncEval: name: org.apache.pig.impl.builtin.ADD args:
                                      Generate: has 2 children
                                              Project: (0)
                                              Project: (1)
                              Project: (0)
                              Project: (1)
                              Project: (2)
      Input Files: /tmp/data1.txt
      Output File: /tmp/temp678140026/tmp1613817084
      

      Here is a sample of my tree output which is more compact and more understandable :-

      grunt> explain c1 as tree ;
      Logical Plan:
      |---LOCogroup ( GENERATE {[PROJECT $0],[*]} ) 
            |---LOSplitOutput (  ) 
                  |---LOSplit ( ([PROJECT $0] < ['5']),([PROJECT $0] >= ['5']) ) 
                        |---LOEval ( GENERATE {[org.apache.pig.impl.builtin.ADD(GENERATE {[PROJECT $0],[PROJECT $1]})],[PROJECT $0],[PROJECT $1],[PROJECT $2]} ) 
                              |---LOLoad ( file = /tmp/data1.txt )
      -----------------------------------------------
      Physical Plan:
      |---POMapreduce
          Map : *
          Grouping : Generate(Project(0),*)
          Input File(s) : /tmp/temp678140026/tmp1867058340
            |---POMapreduce
                Map : Composite(*,Generate(FuncEval(org.apache.pig.impl.builtin.ADD(Generate(Project(0),Project(1)))),Project(0),Project(1),Project(2)))
                Input File(s) : /tmp/data1.txt
      

      I'm also thinking about doing output as xml as it might benefit people who are working on displaying execution plan on GUI.

      1. pig_printtree_2.patch
        17 kB
        Pi Song
      2. pig_printtree_1.patch
        22 kB
        Pi Song

        Activity

        Hide
        Pi Song added a comment -

        The patch that does print tree

        Show
        Pi Song added a comment - The patch that does print tree
        Hide
        Pi Song added a comment -

        No suggestion/comment at all?

        Show
        Pi Song added a comment - No suggestion/comment at all?
        Hide
        Alan Gates added a comment -

        In general the patch looks good. Making the exception output more readable is something we need.

        There's one question I have that I'd like to get input from others on. In the patch you've made arguments to EXPLAIN be tokens in the language (XML, TREE). That's a standard SQL approach. The pro is it is easy for users to type, and SQL users probably already think about things that way. The con is it bloats the number of token in the language (take a look at all the tokens in the SQL standard compared to the number of tokens in a language like java) and it means many changes include changes to the parser.

        The other option is to make EXPLAIN take a string argument, so it would be EXPLAIN 'tree' instead of EXPLAIN TREE. This has the reverse pros and cons. Another pro is java, etc. programmers may think of this as a more natural model.

        Thoughts?

        Show
        Alan Gates added a comment - In general the patch looks good. Making the exception output more readable is something we need. There's one question I have that I'd like to get input from others on. In the patch you've made arguments to EXPLAIN be tokens in the language (XML, TREE). That's a standard SQL approach. The pro is it is easy for users to type, and SQL users probably already think about things that way. The con is it bloats the number of token in the language (take a look at all the tokens in the SQL standard compared to the number of tokens in a language like java) and it means many changes include changes to the parser. The other option is to make EXPLAIN take a string argument, so it would be EXPLAIN 'tree' instead of EXPLAIN TREE. This has the reverse pros and cons. Another pro is java, etc. programmers may think of this as a more natural model. Thoughts?
        Hide
        Pi Song added a comment -

        I understand that you don't want to blow the number of tokens.

        1. We won't end up having so many display formats. Tree and Xml are all I could think of.
        2. In fact, I prefer having

        explain bag1 = displaying tree by default because it's the most intuitive way to look at
        explain bag1 as raw = displaying the current output

        explain bag1 as xml = *** not even sure if this is needed for text-mode grunt

        Show
        Pi Song added a comment - I understand that you don't want to blow the number of tokens. 1. We won't end up having so many display formats. Tree and Xml are all I could think of. 2. In fact, I prefer having explain bag1 = displaying tree by default because it's the most intuitive way to look at explain bag1 as raw = displaying the current output explain bag1 as xml = *** not even sure if this is needed for text-mode grunt
        Hide
        Alan Gates added a comment -

        I know we won't have too many explain formats. It's more a general philosophy question for the language. As we add other new constructs to the language we'll face the same question, and we want to answer it consistently.

        One other comment. I don't see any particular reason to preserve the raw format. I threw that together quickly simply because I needed it, and as I was the only one using it at the time readability was not a concern. AFAIK no one is dependent on the original format.

        Show
        Alan Gates added a comment - I know we won't have too many explain formats. It's more a general philosophy question for the language. As we add other new constructs to the language we'll face the same question, and we want to answer it consistently. One other comment. I don't see any particular reason to preserve the raw format. I threw that together quickly simply because I needed it, and as I was the only one using it at the time readability was not a concern. AFAIK no one is dependent on the original format.
        Hide
        Benjamin Reed added a comment -

        I agree with Alan about not adding more tokens. I don't think we need the quotes though, unless we think we have a format that might collide with a keyword. explain bag1 as xml will be parsed as EXPLAIN identifier AS identifier. We can just use the second identifier as the format rather than an identifier.

        Show
        Benjamin Reed added a comment - I agree with Alan about not adding more tokens. I don't think we need the quotes though, unless we think we have a format that might collide with a keyword. explain bag1 as xml will be parsed as EXPLAIN identifier AS identifier. We can just use the second identifier as the format rather than an identifier.
        Hide
        Olga Natkovich added a comment -

        Can we get some aggreements on this so that we can move this patch forward? Alan? Pi? Ben?

        Show
        Olga Natkovich added a comment - Can we get some aggreements on this so that we can move this patch forward? Alan? Pi? Ben?
        Hide
        Pi Song added a comment -

        Then my conclusion from this discussion above is switching "explain bag1" to my new format so we don't introduce any new tokens. Forget about xml format for the time being. We will create a new issue when we need it. Just get this forward first so that we can debug execution plan more easily.

        Agree?

        Alan,
        If you have time, could you please write down the set of philosophy behind Pig language in the wiki?

        Show
        Pi Song added a comment - Then my conclusion from this discussion above is switching "explain bag1" to my new format so we don't introduce any new tokens. Forget about xml format for the time being. We will create a new issue when we need it. Just get this forward first so that we can debug execution plan more easily. Agree? Alan, If you have time, could you please write down the set of philosophy behind Pig language in the wiki?
        Hide
        Alan Gates added a comment -

        I think Pi's idea is good. We can get explain working in a nice way and push off the language question for now.

        Show
        Alan Gates added a comment - I think Pi's idea is good. We can get explain working in a nice way and push off the language question for now.
        Hide
        Olga Natkovich added a comment -

        clearing patch available flag until new patch is available that does what Pi suggested.

        Show
        Olga Natkovich added a comment - clearing patch available flag until new patch is available that does what Pi suggested.
        Hide
        Pi Song added a comment -

        This patch switches the normal explain statement to use my walk tree implementation without introducing new tokens.

        Somebody please mark this as patch available.

        Show
        Pi Song added a comment - This patch switches the normal explain statement to use my walk tree implementation without introducing new tokens. Somebody please mark this as patch available.
        Hide
        Alan Gates added a comment -

        Pi's tree output checked in at revision 632546. Thank Pi for the patch.

        Show
        Alan Gates added a comment - Pi's tree output checked in at revision 632546. Thank Pi for the patch.

          People

          • Assignee:
            Pi Song
            Reporter:
            Pi Song
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development