Hadoop Common
  1. Hadoop Common
  2. HADOOP-1894

Add fancy graphs for mapred task statuses

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.15.0
    • Fix Version/s: 0.15.0
    • Component/s: None
    • Labels:
      None

      Description

      I whould like to add graphics for mapred task statuses.

      1. fancygraph_v1.0.patch
        22 kB
        Enis Soztutar
      2. percentage.png
        61 kB
        Enis Soztutar
      3. mapreduce.png
        42 kB
        Enis Soztutar
      4. fancygraph_v1.1.patch
        23 kB
        Enis Soztutar
      5. fancygraph_v1.2.patch
        24 kB
        Enis Soztutar
      6. fancygraph_v1.3.patch
        24 kB
        Enis Soztutar

        Issue Links

          Activity

          Hide
          Enis Soztutar added a comment -

          This patch adds three kinds of graphs.

          1. The percentage graph for dfs / mapred UI
          2. map completion graph
          3. reduce completion graph

          map/reduce completion graphs resemble the ones in google's MR. There is one bar for every map and reduce task. The bar is shown in three different colors depending on the phase of the reduce task(copy/sort/merge). Map/reduce graphs are show in jobdetails page.

          They are in svg format, firefox shows them w/o any plugin or smt. but i'm not sure for IE. There is a plugin called adobe svg viewer for IE and FIrefox.

          Check out the screenshots !

          Show
          Enis Soztutar added a comment - This patch adds three kinds of graphs. The percentage graph for dfs / mapred UI map completion graph reduce completion graph map/reduce completion graphs resemble the ones in google's MR. There is one bar for every map and reduce task. The bar is shown in three different colors depending on the phase of the reduce task(copy/sort/merge). Map/reduce graphs are show in jobdetails page. They are in svg format, firefox shows them w/o any plugin or smt. but i'm not sure for IE. There is a plugin called adobe svg viewer for IE and FIrefox. Check out the screenshots !
          Hide
          Enis Soztutar added a comment -

          Attaching the screenshot for percentage graph

          Show
          Enis Soztutar added a comment - Attaching the screenshot for percentage graph
          Hide
          Enis Soztutar added a comment -

          Attaching the screenshot for MR task graphs.

          Show
          Enis Soztutar added a comment - Attaching the screenshot for MR task graphs.
          Hide
          Tom White added a comment -

          Nice. I wonder whether we could use sparklines to keep the display very compact (on the summary page), perhaps with click through to a more detailed graph? There's an Apache-licensed sparklines library here which includes a servlet for plotting datapoints: http://www.representqueens.com/spark/.

          Show
          Tom White added a comment - Nice. I wonder whether we could use sparklines to keep the display very compact (on the summary page), perhaps with click through to a more detailed graph? There's an Apache-licensed sparklines library here which includes a servlet for plotting datapoints: http://www.representqueens.com/spark/ .
          Hide
          Enis Soztutar added a comment -

          I should check that.

          Show
          Enis Soztutar added a comment - I should check that.
          Hide
          Enis Soztutar added a comment -

          Tom, do you think the sparklines would be useful? I am somewhat reluctant to add a dependency.

          Show
          Enis Soztutar added a comment - Tom, do you think the sparklines would be useful? I am somewhat reluctant to add a dependency.
          Hide
          Jeff Hammerbacher added a comment -

          (attaching to ticket instead of just sending to hadoop-dev--my apologies; thought that would get automatically added to ticket)

          enis, great patch--been dying for these graphs since seeing that jeff dean presentation (http://www.slideshare.net/jhammerb/mapreduce-pact06-keynote ). tom, i'm not sure where sparklines make sense in this context?

          also, some comments:
          1) changing order of import statements is unneccessary
          2) several whitespace changes made it into this patch – on purpose?
          3) what is "serialVersionUID"? it's set but never used, from what i can tell.

          jeff

          Show
          Jeff Hammerbacher added a comment - (attaching to ticket instead of just sending to hadoop-dev--my apologies; thought that would get automatically added to ticket) enis, great patch--been dying for these graphs since seeing that jeff dean presentation ( http://www.slideshare.net/jhammerb/mapreduce-pact06-keynote ). tom, i'm not sure where sparklines make sense in this context? also, some comments: 1) changing order of import statements is unneccessary 2) several whitespace changes made it into this patch – on purpose? 3) what is "serialVersionUID"? it's set but never used, from what i can tell. jeff
          Hide
          Enis Soztutar added a comment -

          1) changing order of import statements is unneccessary

          The patch remove imports ending with * and puts actual class names.

          2) several whitespace changes made it into this patch - on purpose?

          I do not like whitespace-only changes, They occur because the line contain some whitespace, and in the development process, they are removed. Although we do not like them, it is not a strict policy of hadoop.

          3) what is "serialVersionUID"? it's set but never used, from what i can tell.

          Serializable classes should have a serialVersionUID. see the javadoc for further info

          Show
          Enis Soztutar added a comment - 1) changing order of import statements is unneccessary The patch remove imports ending with * and puts actual class names. 2) several whitespace changes made it into this patch - on purpose? I do not like whitespace-only changes, They occur because the line contain some whitespace, and in the development process, they are removed. Although we do not like them, it is not a strict policy of hadoop. 3) what is "serialVersionUID"? it's set but never used, from what i can tell. Serializable classes should have a serialVersionUID. see the javadoc for further info
          Hide
          Tom White added a comment -

          do you think the sparklines would be useful?

          I was thinking that sparklines for map and reduce completion could be added to each row of the Running Jobs table on the summary page. It would be a nice way to see if any jobs are getting stuck, or slow, compared to other ones. Just a thought.

          I am somewhat reluctant to add a dependency.

          It's a small 17K jar file - nothing to worry about really.

          Show
          Tom White added a comment - do you think the sparklines would be useful? I was thinking that sparklines for map and reduce completion could be added to each row of the Running Jobs table on the summary page. It would be a nice way to see if any jobs are getting stuck, or slow, compared to other ones. Just a thought. I am somewhat reluctant to add a dependency. It's a small 17K jar file - nothing to worry about really.
          Hide
          Enis Soztutar added a comment -

          The number of map / reduce tasks could be on the order of thousands. Using sparklines, we could not generate that big graphics, but we could add 1 bar for, say 100 tasks. But then first we cannot monitor each task, and second the average progress for 100 reduces may not be interpretable (considering the three phases).

          I would like two things to be fixed, first i would be very glad if someone could check the graphics in IE. Second we may wish to somehow disable the graphics, so that a graph for 10000+ map tasks will not get in the way. Any thoughts ?

          Show
          Enis Soztutar added a comment - The number of map / reduce tasks could be on the order of thousands. Using sparklines, we could not generate that big graphics, but we could add 1 bar for, say 100 tasks. But then first we cannot monitor each task, and second the average progress for 100 reduces may not be interpretable (considering the three phases). I would like two things to be fixed, first i would be very glad if someone could check the graphics in IE. Second we may wish to somehow disable the graphics, so that a graph for 10000+ map tasks will not get in the way. Any thoughts ?
          Hide
          eric baldeschwieler added a comment -

          This looks very interesting!

          Show
          eric baldeschwieler added a comment - This looks very interesting!
          Hide
          Enis Soztutar added a comment -

          Ok, i have tested this plugin in IE6 and it worked with adobe svg viewer, but the percentage graphs are not so elegant. But i will hardly consider this a major issue.

          I have added open / close links to enable/disable the graphs. The info is kept in the session. I have also improved the logic for segmentation in x axis.

          Show
          Enis Soztutar added a comment - Ok, i have tested this plugin in IE6 and it worked with adobe svg viewer, but the percentage graphs are not so elegant. But i will hardly consider this a major issue. I have added open / close links to enable/disable the graphs. The info is kept in the session. I have also improved the logic for segmentation in x axis.
          Hide
          Doug Cutting added a comment -

          I like this! Before we commit it however we should do something reasonable for jobs with many thousands of tasks.

          Perhaps, when there are greater than, e.g., 700 tasks, each pixel column would represent a group of tasks (numTasks/700), with the value of the pixel proportional to the number of tasks that have that degree of completion. So, if 50% of the tasks in the group are 10% done, then the ten-percent pixel would be half-bright. Would that be too expensive to compute?

          Short of that, we could page through the tasks, as we do elsewhere. So the first image would contain the first 700 tasks, with a "next" button to get the image for the next 700 tasks, and so on.

          Or, we could simply disable the display for jobs with more than a certain number of tasks, but that'd be a shame.

          Show
          Doug Cutting added a comment - I like this! Before we commit it however we should do something reasonable for jobs with many thousands of tasks. Perhaps, when there are greater than, e.g., 700 tasks, each pixel column would represent a group of tasks (numTasks/700), with the value of the pixel proportional to the number of tasks that have that degree of completion. So, if 50% of the tasks in the group are 10% done, then the ten-percent pixel would be half-bright. Would that be too expensive to compute? Short of that, we could page through the tasks, as we do elsewhere. So the first image would contain the first 700 tasks, with a "next" button to get the image for the next 700 tasks, and so on. Or, we could simply disable the display for jobs with more than a certain number of tasks, but that'd be a shame.
          Hide
          Owen O'Malley added a comment -

          This is very cool.

          I'd like the grouping together of (#tasks/700) tasks into a single bar. I'd much rather have a single over view page than the precise number of each task.

          Show
          Owen O'Malley added a comment - This is very cool. I'd like the grouping together of (#tasks/700) tasks into a single bar. I'd much rather have a single over view page than the precise number of each task.
          Hide
          Enis Soztutar added a comment -

          thanks !
          I have also thought about more than one task per bar, and was going to implement it, but then i gave up the idea. The reason for that is the progress for reduce tasks is three phases(0-0.33, 0.33-0.66, 0.66-1.0). And i normalize each progress for each phase to 0 - 100 interval. But if we average over a few reduce tasks, then the progress(in the interval 0-1) will not correspond to actual phases. For example if we average over 0.0 0.60 and 0.90, then the bar should have 0.50, which means the bar will show sort phase, but actualy only one of the tasks is in sort.

          I am in favor of keeping the graph compact, but i still cannot figure out how to group tasks together. Any ideas ?

          Show
          Enis Soztutar added a comment - thanks ! I have also thought about more than one task per bar, and was going to implement it, but then i gave up the idea. The reason for that is the progress for reduce tasks is three phases(0-0.33, 0.33-0.66, 0.66-1.0). And i normalize each progress for each phase to 0 - 100 interval. But if we average over a few reduce tasks, then the progress(in the interval 0-1) will not correspond to actual phases. For example if we average over 0.0 0.60 and 0.90, then the bar should have 0.50, which means the bar will show sort phase, but actualy only one of the tasks is in sort. I am in favor of keeping the graph compact, but i still cannot figure out how to group tasks together. Any ideas ?
          Hide
          eric baldeschwieler added a comment -

          You could stack the three phases and use shade or make each phase a second progress bar show what % of tasks have competed that phase. I kind of like the idea of stacking 3 colors with each growing from 0% length to 33%. That should work visually.

          Of course we are more interested in outliers than every task, so maybe one could just show net aggregate progress (a histogram of 10% complete increments showing the % of tasks in each). This would look like a stereo graphics equalizer with the color flowing from bars on the left to bars in the right as the task progresses. Then we could just break out the 5 slowest, 5 median and 5 fastest tasks and show them in more detail (task ID and % progress)

          Show
          eric baldeschwieler added a comment - You could stack the three phases and use shade or make each phase a second progress bar show what % of tasks have competed that phase. I kind of like the idea of stacking 3 colors with each growing from 0% length to 33%. That should work visually. Of course we are more interested in outliers than every task, so maybe one could just show net aggregate progress (a histogram of 10% complete increments showing the % of tasks in each). This would look like a stereo graphics equalizer with the color flowing from bars on the left to bars in the right as the task progresses. Then we could just break out the 5 slowest, 5 median and 5 fastest tasks and show them in more detail (task ID and % progress)
          Hide
          Enis Soztutar added a comment -

          Here is the updated patch,

          fixed graph size to maximum 600px. Now a bar can show more than one task. The reduce bars show a stack display of copy / sort / reduce phases, and for each phase the average progress is computed.

          Show
          Enis Soztutar added a comment - Here is the updated patch, fixed graph size to maximum 600px. Now a bar can show more than one task. The reduce bars show a stack display of copy / sort / reduce phases, and for each phase the average progress is computed.
          Hide
          Doug Cutting added a comment -

          This looks good to me. I tried it in pseudo-distributed mode with up to 2000 tasks and it seems to work well. Can someone please test this on a larger cluster, e.g., with the sort benchmark?

          Show
          Doug Cutting added a comment - This looks good to me. I tried it in pseudo-distributed mode with up to 2000 tasks and it seems to work well. Can someone please test this on a larger cluster, e.g., with the sort benchmark?
          Hide
          Mukund Madhugiri added a comment -

          I will test it with the sort 500 benchmark

          Show
          Mukund Madhugiri added a comment - I will test it with the sort 500 benchmark
          Hide
          Owen O'Malley added a comment -

          It doesn't work in the case of # of reduces = 0, the jsp throws a divide by 0 exception.

          Other than that, it works great.

          Show
          Owen O'Malley added a comment - It doesn't work in the case of # of reduces = 0, the jsp throws a divide by 0 exception. Other than that, it works great.
          Hide
          Owen O'Malley added a comment -

          On a current mac book pro with firefox, the graph makes my browser pretty unresponsive. I checked and the data size isn't that big (44k). Does anyone have any ideas on making it more responsive?

          Show
          Owen O'Malley added a comment - On a current mac book pro with firefox, the graph makes my browser pretty unresponsive. I checked and the data size isn't that big (44k). Does anyone have any ideas on making it more responsive?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12366534/fancygraph_v1.2.patch
          against trunk revision r579353.

          @author +1. The patch does not contain any @author tags.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new compiler warnings.

          findbugs -1. The patch appears to introduce 1 new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests -1. The patch failed contrib unit tests.

          Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/825/testReport/
          Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/825/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/825/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/825/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12366534/fancygraph_v1.2.patch against trunk revision r579353. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs -1. The patch appears to introduce 1 new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/825/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/825/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/825/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/825/console This message is automatically generated.
          Hide
          Doug Cutting added a comment -

          > On a current mac book pro with firefox, the graph makes my browser pretty unresponsive.

          What version of Firefox? It's pretty zippy on Ubuntu with Firefox 2.0.

          Show
          Doug Cutting added a comment - > On a current mac book pro with firefox, the graph makes my browser pretty unresponsive. What version of Firefox? It's pretty zippy on Ubuntu with Firefox 2.0.
          Hide
          Arun C Murthy added a comment -

          Enis, first and foremost - it's pretty awesome.

          Nits: While running on a cluster of ~500 nodes, the maps' graph was fine, but the reduces' graph was slightly off. It showed that all the reducers hitting 33.33% quite early though shuffle was still on... maybe a minor math-error?

          Show
          Arun C Murthy added a comment - Enis, first and foremost - it's pretty awesome. Nits: While running on a cluster of ~500 nodes, the maps' graph was fine, but the reduces' graph was slightly off. It showed that all the reducers hitting 33.33% quite early though shuffle was still on... maybe a minor math-error?
          Hide
          Enis Soztutar added a comment -

          The graph size may become big since we use xml to encode every little bar and for a graph the upper limit of number of rectangles is 3 * 600, but that should not be a problem.

          I will fix 0 reducer issue and check reduce graph progress calculation. Arun could you give me some more insight?

          meanwhile i would really appreciate more tests since i currently use only 3 nodes in my cluster.

          Show
          Enis Soztutar added a comment - The graph size may become big since we use xml to encode every little bar and for a graph the upper limit of number of rectangles is 3 * 600, but that should not be a problem. I will fix 0 reducer issue and check reduce graph progress calculation. Arun could you give me some more insight? meanwhile i would really appreciate more tests since i currently use only 3 nodes in my cluster.
          Hide
          Arun C Murthy added a comment -

          The thing I noticed is that the reduces graph (green lines) all hit 33% even though actual progress made was more like 12% or 13% i.e. shuffle was less than half-done. I'll try and get you some more specific info if you need, let me know what I should be looking for...

          Show
          Arun C Murthy added a comment - The thing I noticed is that the reduces graph (green lines) all hit 33% even though actual progress made was more like 12% or 13% i.e. shuffle was less than half-done. I'll try and get you some more specific info if you need, let me know what I should be looking for...
          Hide
          Owen O'Malley added a comment -

          The size of the xml, while larger than a bitmap isn't excessive (36-100k).

          It is Firefox 2.0.0.4, so it isn't that old.

          The shuffle was at 17% in the text box (both in aggregate and in most tasks) and the graph was showing the shuffle at 33% (ie done). There are 37600 maps and 950 reduces.

          Show
          Owen O'Malley added a comment - The size of the xml, while larger than a bitmap isn't excessive (36-100k). It is Firefox 2.0.0.4, so it isn't that old. The shuffle was at 17% in the text box (both in aggregate and in most tasks) and the graph was showing the shuffle at 33% (ie done). There are 37600 maps and 950 reduces.
          Hide
          Owen O'Malley added a comment -

          This screen shot shows the maps half done and the graph shows the copy at 33% (ie. done).

          Show
          Owen O'Malley added a comment - This screen shot shows the maps half done and the graph shows the copy at 33% (ie. done).
          Hide
          Enis Soztutar added a comment -

          Fixed division by zero ex (reduce graph is not shown)
          Fixed progress calculation. (I have forgotten to remove multiplication by 3 after changing graph style, which remained undetected due to rounding calculation)

          Owen, I use firefox 2.0.0.2 on ubuntu but experienced no difficultly. What do you think we can do about mac? Will the patch be acceptable as it is now, or should we consider disabling the graphs by default?

          Show
          Enis Soztutar added a comment - Fixed division by zero ex (reduce graph is not shown) Fixed progress calculation. (I have forgotten to remove multiplication by 3 after changing graph style, which remained undetected due to rounding calculation) Owen, I use firefox 2.0.0.2 on ubuntu but experienced no difficultly. What do you think we can do about mac? Will the patch be acceptable as it is now, or should we consider disabling the graphs by default?
          Hide
          eric baldeschwieler added a comment -

          I don't think they are so slow that we need to reject them.

          Let's make sure they work ok on the windows with firefox and that the IE7 experience is not really bad. Notification you need a plugin is fine, lots of clutter on the screen would be bad.

          This is a great starting point. I'm sure we're going to see a lot of evolution from here. There is no requirement it be perfect. Only that it not cause major problems.

          Show
          eric baldeschwieler added a comment - I don't think they are so slow that we need to reject them. Let's make sure they work ok on the windows with firefox and that the IE7 experience is not really bad. Notification you need a plugin is fine, lots of clutter on the screen would be bad. This is a great starting point. I'm sure we're going to see a lot of evolution from here. There is no requirement it be perfect. Only that it not cause major problems.
          Hide
          Doug Cutting added a comment -

          > There is no requirement it be perfect. Only that it not cause major problems.

          +1 Can anyone report on how this looks in IE? If that's okay then I think we're ready to commit.

          Show
          Doug Cutting added a comment - > There is no requirement it be perfect. Only that it not cause major problems. +1 Can anyone report on how this looks in IE? If that's okay then I think we're ready to commit.
          Hide
          Owen O'Malley added a comment -

          On IE it just showed up as the unknown data type icon, which seems fine. I'm ok for committing it.

          Show
          Owen O'Malley added a comment - On IE it just showed up as the unknown data type icon, which seems fine. I'm ok for committing it.
          Hide
          Enis Soztutar added a comment -

          I have tried the patch in IE6 with adobe svg viewer and it is ok. I will add a notification in FAQ about this after the patch makes into trunk.

          Show
          Enis Soztutar added a comment - I have tried the patch in IE6 with adobe svg viewer and it is ok. I will add a notification in FAQ about this after the patch makes into trunk.
          Hide
          Enis Soztutar added a comment -

          I committed this.

          Show
          Enis Soztutar added a comment - I committed this.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-Nightly #252 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/252/ )

            People

            • Assignee:
              Enis Soztutar
              Reporter:
              Enis Soztutar
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development