Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-7918

Provide graphing tool along with cassandra-stress

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Fix Version/s: 3.2
    • Component/s: Tools
    • Labels:
      None

      Description

      Whilst cstar makes some pretty graphs, they're a little limited and also require you to run your tests through it. It would be useful to be able to graph results from any stress run easily.

      1. reads.svg
        322 kB
        Benedict
      2. 7918.patch.txt
        289 kB
        Ryan McGuire

        Issue Links

          Activity

          Hide
          jeromatron Jeremy Hanna added a comment -

          Is there any reason why this couldn't go back as far as 2.1?

          Show
          jeromatron Jeremy Hanna added a comment - Is there any reason why this couldn't go back as far as 2.1?
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          Committed to trunk as e4467a0f6d3f9c616fa6f3fc3e51c99aa3925878. Thanks Ryan!

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - Committed to trunk as e4467a0f6d3f9c616fa6f3fc3e51c99aa3925878 . Thanks Ryan!
          Hide
          enigmacurry Ryan McGuire added a comment -

          Joshua McKenzie I've rebased, and made the changes you mentioned, thanks for the review!

          https://github.com/EnigmaCurry/cassandra/tree/7918_review_notest

          Show
          enigmacurry Ryan McGuire added a comment - Joshua McKenzie I've rebased, and made the changes you mentioned, thanks for the review! https://github.com/EnigmaCurry/cassandra/tree/7918_review_notest
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          ping Ryan McGuire (don't want you to have to pay a rebase-tax on this if you don't have to)

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - ping Ryan McGuire (don't want you to have to pay a rebase-tax on this if you don't have to)
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          Squashed your commits and touched up some style nits while reviewing - pushed here

          Patch looks solid. I didn't go in-depth reviewing the .html file included, but as for the java side I have a couple of outstanding questions:

          • MultiPrintStream has quite a few non-overridden base (write*/print*) methods that would allow writing to the base stream without writing to the additional streams. I'd recommend we override all print/write methods to make them multiStream compatible.
          • In StressAction, the following looks incorrect to me:
            mismatch
            if (settings.rate.minThreads > 0)
            {
               output.println("Thread count was not specified, testing multiple thread counts");
            

            settings.rate.minThreads looks to indicate our preferred min thread-count which that output message contradicts.

          Just about good to commit once we get those two minor things ironed out. Thanks for sticking with this!

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - Squashed your commits and touched up some style nits while reviewing - pushed here Patch looks solid. I didn't go in-depth reviewing the .html file included, but as for the java side I have a couple of outstanding questions: MultiPrintStream has quite a few non-overridden base (write*/print*) methods that would allow writing to the base stream without writing to the additional streams. I'd recommend we override all print/write methods to make them multiStream compatible. In StressAction, the following looks incorrect to me: mismatch if (settings.rate.minThreads > 0) { output.println( " Thread count was not specified, testing multiple thread counts" ); settings.rate.minThreads looks to indicate our preferred min thread-count which that output message contradicts. Just about good to commit once we get those two minor things ironed out. Thanks for sticking with this!
          Hide
          michaelsembwever mck added a comment -

          Used this to demonstrate expected performance loss when turning on full audit logging in DSE, and selective audit logging using a logback filter.
          Incredibly useful, thanks Ryan McGuire and Benedict.

          Show
          michaelsembwever mck added a comment - Used this to demonstrate expected performance loss when turning on full audit logging in DSE, and selective audit logging using a logback filter. Incredibly useful, thanks Ryan McGuire and Benedict .
          Hide
          enigmacurry Ryan McGuire added a comment -
          Show
          enigmacurry Ryan McGuire added a comment - Joshua McKenzie Rebased on trunk, retested, and force pushed: https://github.com/enigmacurry/cassandra/tree/7918-stress-graph
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          SGTM. Could I get this branch rebased and Patch Available when it's ready?

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - SGTM. Could I get this branch rebased and Patch Available when it's ready?
          Hide
          enigmacurry Ryan McGuire added a comment -

          We don't have any bandwidth to work on #9870 right now. I'm fine commiting this in stages, or waiting, but #9870 likely won't happen until after 3.0 ships. 9870 is more focussed on the visual aspect than the results capturing that this handles.

          Show
          enigmacurry Ryan McGuire added a comment - We don't have any bandwidth to work on #9870 right now. I'm fine commiting this in stages, or waiting, but #9870 likely won't happen until after 3.0 ships. 9870 is more focussed on the visual aspect than the results capturing that this handles.
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          Ryan McGuire: Is this ticket superseded by CASSANDRA-9870?

          From Shawn Kumar's comment over there:

          I have built off what Ryan had already written out, but changes were quite significant since the code was previously pretty much limited to displaying raw metrics and organized for that purpose

          I'm fine with reviewing this ticket if you want to rebase it or with us waiting and just doing the entire thing w/9870.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - Ryan McGuire : Is this ticket superseded by CASSANDRA-9870 ? From Shawn Kumar 's comment over there: I have built off what Ryan had already written out, but changes were quite significant since the code was previously pretty much limited to displaying raw metrics and organized for that purpose I'm fine with reviewing this ticket if you want to rebase it or with us waiting and just doing the entire thing w/9870.
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          I'll review Ryan's actual code in a few and see if it applies cleanly to trunk.

          Still needs actual code review before commit. I plan on getting to it sometime next week.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - I'll review Ryan's actual code in a few and see if it applies cleanly to trunk. Still needs actual code review before commit. I plan on getting to it sometime next week.
          Hide
          jeromatron Jeremy Hanna added a comment -

          Can this be committed then? Just didn't know if it was waiting on anything else.

          Show
          jeromatron Jeremy Hanna added a comment - Can this be committed then? Just didn't know if it was waiting on anything else.
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          I think CASSANDRA-9870 covers it. I'll review Ryan's actual code in a few and see if it applies cleanly to trunk.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - I think CASSANDRA-9870 covers it. I'll review Ryan's actual code in a few and see if it applies cleanly to trunk.
          Hide
          jeromatron Jeremy Hanna added a comment -

          Joshua McKenzie was there another ticket that you'd want with tracking the history (logs and settings) for the run in the output or something like?

          Otherwise, can this be committed/resolved?

          Show
          jeromatron Jeremy Hanna added a comment - Joshua McKenzie was there another ticket that you'd want with tracking the history (logs and settings) for the run in the output or something like? Otherwise, can this be committed/resolved?
          Hide
          benedict Benedict added a comment -
          Show
          benedict Benedict added a comment - Filed CASSANDRA-9870
          Hide
          enigmacurry Ryan McGuire added a comment -

          I have a guy starting next week that is familiar with JS.

          Benedict Can you please create the extra tickets? I think you're the best one to describe what your other charts consist of.

          Show
          enigmacurry Ryan McGuire added a comment - I have a guy starting next week that is familiar with JS. Benedict Can you please create the extra tickets? I think you're the best one to describe what your other charts consist of.
          Hide
          benedict Benedict added a comment -

          WFM then. Ryan McGuire: does anyone on your team have the bandwidth to address that on that kind of timescale?

          Show
          benedict Benedict added a comment - WFM then. Ryan McGuire : does anyone on your team have the bandwidth to address that on that kind of timescale?
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          Either that or we commit as is and assign somebody the task of improving the graphs, and they commit to delivering them within the next two months (meaning we may get them before christmas ), say.

          That's what I had in mind. We don't commit / close this ticket until follow-up tickets are created and assigned.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - Either that or we commit as is and assign somebody the task of improving the graphs, and they commit to delivering them within the next two months (meaning we may get them before christmas ), say. That's what I had in mind. We don't commit / close this ticket until follow-up tickets are created and assigned.
          Hide
          benedict Benedict added a comment -

          I'm not sure my opinion should carry as much weight here, however I'll restate what I've said privately: my concern is that this has languished for getting on for a year, and if we commit without improving the graphs the little pressure there is will vanish, and we'll all make do for another year or more. Committing doesn't buy the project much, but getting better graphs sooner buys it quite a lot. Not committing perhaps costs the users who would like this now a little, but potentially buys them the better decision making of improved graphs sooner in trade.

          To me that makes delay a net win, but that's only my finger in air wild assumptions.

          TL;DR: I'd prefer we harnessed the little pressure we're getting for this, by delaying commit until we can improve the graphs, so we're encouraged to do so sooner.

          Either that or we commit as is and assign somebody the task of improving the graphs, and they commit to delivering them within the next two months (meaning we may get them before christmas ), say.

          Show
          benedict Benedict added a comment - I'm not sure my opinion should carry as much weight here, however I'll restate what I've said privately: my concern is that this has languished for getting on for a year, and if we commit without improving the graphs the little pressure there is will vanish, and we'll all make do for another year or more. Committing doesn't buy the project much, but getting better graphs sooner buys it quite a lot. Not committing perhaps costs the users who would like this now a little, but potentially buys them the better decision making of improved graphs sooner in trade. To me that makes delay a net win, but that's only my finger in air wild assumptions. TL;DR: I'd prefer we harnessed the little pressure we're getting for this, by delaying commit until we can improve the graphs, so we're encouraged to do so sooner. Either that or we commit as is and assign somebody the task of improving the graphs, and they commit to delivering them within the next two months (meaning we may get them before christmas ), say.
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          I'd prefer follow-up tickets focus on adding the extra information from Benedict's approach first and then we move to the historical, as the GC and latency information is critical along w/the throughput.

          I'm fine w/reviewing Ryan's current delta (after rebase to 2.2) and committing with just the information we have with the caveat/reinforcement that I feel pretty strongly that we need the extra information that Benedict's graphs offer us.

          Benedict - thoughts?

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - I'd prefer follow-up tickets focus on adding the extra information from Benedict's approach first and then we move to the historical, as the GC and latency information is critical along w/the throughput. I'm fine w/reviewing Ryan's current delta (after rebase to 2.2) and committing with just the information we have with the caveat/reinforcement that I feel pretty strongly that we need the extra information that Benedict's graphs offer us. Benedict - thoughts?
          Hide
          jeromatron Jeremy Hanna added a comment -

          Benedict Joshua McKenzie what do you guys think of committing this as is and using follow-up tickets to address the historical stuff?

          Show
          jeromatron Jeremy Hanna added a comment - Benedict Joshua McKenzie what do you guys think of committing this as is and using follow-up tickets to address the historical stuff?
          Hide
          enigmacurry Ryan McGuire added a comment -

          Historical logs of stress would be great, but for graphing purposes, I think you still need an expressive grammar for describing what those logs are for and how they relate, in my code that's what the revision and title options are for. I would envision that log directory to not be a flat directory containing a series of logs, but rather subdirectories containing groupings of related logs.

          Show
          enigmacurry Ryan McGuire added a comment - Historical logs of stress would be great, but for graphing purposes, I think you still need an expressive grammar for describing what those logs are for and how they relate, in my code that's what the revision and title options are for. I would envision that log directory to not be a flat directory containing a series of logs, but rather subdirectories containing groupings of related logs.
          Hide
          jeromatron Jeremy Hanna added a comment -

          It sounds like we could possibly commit this as-in and create a follow-on ticket to have a bundle of the config for historical purposes (i.e. to remember what options we used for the run). Would that be agreeable to the parties involved?

          Show
          jeromatron Jeremy Hanna added a comment - It sounds like we could possibly commit this as-in and create a follow-on ticket to have a bundle of the config for historical purposes (i.e. to remember what options we used for the run). Would that be agreeable to the parties involved?
          Hide
          jeromatron Jeremy Hanna added a comment -

          Couldn't it create the verbose stuff anyway and still generate the graph like what Ryan's stuff currently does? I too would love for this to get unstuck

          Show
          jeromatron Jeremy Hanna added a comment - Couldn't it create the verbose stuff anyway and still generate the graph like what Ryan's stuff currently does? I too would love for this to get unstuck
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -
          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - ping Ryan McGuire
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          Fair point. My initial thought was 1 arg to name the output file (i.e. name for the test you're doing) and the rest passed through to stress, but as you said it's not a pressing question.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - Fair point. My initial thought was 1 arg to name the output file (i.e. name for the test you're doing) and the rest passed through to stress, but as you said it's not a pressing question.
          Hide
          benedict Benedict added a comment -

          How about something like a verbose_stress.sh

          SGTM. Although I'm not such a fan of datetime naming - they need to be prohibitively long to get uniqueness, and are really ugly to parse (mentally). Might prefer a mix of date + short ascii hash. Not exactly a pressing question though.

          Show
          benedict Benedict added a comment - How about something like a verbose_stress.sh SGTM. Although I'm not such a fan of datetime naming - they need to be prohibitively long to get uniqueness, and are really ugly to parse (mentally). Might prefer a mix of date + short ascii hash. Not exactly a pressing question though.
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          I think the general concern is that maintaining a code-base with gnuplot in it isn't something your fellow contributors are thrilled about, not the potential difficulty of a user interacting with it.

          How about something like a verbose_stress.sh that dumps current commit sha, yaml settings, and stress args to a file, passes all args through to cassandra-stress.* and appends the stress output to that file, then compresses the final results to the an archive named w/datetime stamp? Some simple section delimiters and our graph generator could parse that trivially.

          Avoids the coupling w/stress, keeps the collection of metadata and test output as a separate logical entity, and we get our canonical source of truth.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - I think the general concern is that maintaining a code-base with gnuplot in it isn't something your fellow contributors are thrilled about, not the potential difficulty of a user interacting with it. How about something like a verbose_stress.sh that dumps current commit sha, yaml settings, and stress args to a file, passes all args through to cassandra-stress.* and appends the stress output to that file, then compresses the final results to the an archive named w/datetime stamp? Some simple section delimiters and our graph generator could parse that trivially. Avoids the coupling w/stress, keeps the collection of metadata and test output as a separate logical entity, and we get our canonical source of truth.
          Hide
          benedict Benedict added a comment -

          It's worth pointing out that the user doesn't have to ever touch gnuplot; it compiles scripts for gnuplot, and shells out itself.

          I don't have any specific attachment to it, though, and if we can get the same info via some other means I'm thrilled. My ideal world would be one with graphs akin to those I produced with gnuplot, but in javascript, with interactive buttons most especially for turning on/off certain aspects of the graph, so that they can more easily be viewed. For instance, adding/removing specific branches, or latency bands.

          I think stress should output all of the settings it receives if -log level=verbose is provided. However I'm not sure we want to tightly couple stress to the cassandra.yaml or the SHA. The approach I took was to parse a stress output, so if we standardise our performance tests to always run stress in verbose mode, the output file can become the canonical source of truth, and the graph generated on the fly. Perhaps we can SHA the output file, and store it in its entirety somewhere, inside a zip containing the cassandra.yaml, so that the graph can just contain this hash of the output file to route us to the permanent record?

          Show
          benedict Benedict added a comment - It's worth pointing out that the user doesn't have to ever touch gnuplot; it compiles scripts for gnuplot, and shells out itself. I don't have any specific attachment to it, though, and if we can get the same info via some other means I'm thrilled. My ideal world would be one with graphs akin to those I produced with gnuplot, but in javascript, with interactive buttons most especially for turning on/off certain aspects of the graph, so that they can more easily be viewed. For instance, adding/removing specific branches, or latency bands. I think stress should output all of the settings it receives if -log level=verbose is provided. However I'm not sure we want to tightly couple stress to the cassandra.yaml or the SHA. The approach I took was to parse a stress output, so if we standardise our performance tests to always run stress in verbose mode, the output file can become the canonical source of truth, and the graph generated on the fly. Perhaps we can SHA the output file, and store it in its entirety somewhere, inside a zip containing the cassandra.yaml, so that the graph can just contain this hash of the output file to route us to the permanent record?
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          Given our recent regressions and the upcoming effort for a performance testing harness, we need to move on this.

          Right now we have 1) Benedict's option that has more information but that's written using gnuplot which people feel strongly against and 2) Ryan's option that has less information available but is perhaps more immediately / intuitively digestible and doesn't use gnuplot.

          Ryan McGuire: what are the chances you could integrate the throughput/latency/gc and tri-graphing approach benedict took into the existing cstar framework, giving us the best of both worlds? I wouldn't mind seeing the current format of the #'s from your solution below the graphs.

          One other thing - we need to scrape the cassandra.yaml file and dump out the relevant settings used for the test (or perhaps just all of them at the outset) as well as snapshotting the specific cassandra-stress command used to generate the test results for reproduction. A SHA for commit used on the test would also help, and I think that would give us a solid initial framework to start testing with and have reproducible tests.

          We can pursue future additions onto this later (capturing system info, /proc/cpuinfo, etc) but there's no point in holding it up to get it to be perfect for our 1st revision.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - Given our recent regressions and the upcoming effort for a performance testing harness, we need to move on this. Right now we have 1) Benedict's option that has more information but that's written using gnuplot which people feel strongly against and 2) Ryan's option that has less information available but is perhaps more immediately / intuitively digestible and doesn't use gnuplot. Ryan McGuire : what are the chances you could integrate the throughput/latency/gc and tri-graphing approach benedict took into the existing cstar framework, giving us the best of both worlds? I wouldn't mind seeing the current format of the #'s from your solution below the graphs. One other thing - we need to scrape the cassandra.yaml file and dump out the relevant settings used for the test (or perhaps just all of them at the outset) as well as snapshotting the specific cassandra-stress command used to generate the test results for reproduction. A SHA for commit used on the test would also help, and I think that would give us a solid initial framework to start testing with and have reproducible tests. We can pursue future additions onto this later (capturing system info, /proc/cpuinfo, etc) but there's no point in holding it up to get it to be perfect for our 1st revision.
          Hide
          benedict Benedict added a comment - - edited

          FTR, gnuplot does (apparently) work on Windows

          edit: to avoid hunting around inside CASSANDRA-7282, I've uploaded the read comparison graph to this ticket

          Show
          benedict Benedict added a comment - - edited FTR, gnuplot does (apparently) work on Windows edit: to avoid hunting around inside CASSANDRA-7282 , I've uploaded the read comparison graph to this ticket
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          What are we stuck on with this ticket right now? I want perf graphs for Windows testing so let's un-stick this.

          Given the testing improvements being worked on right now having some tool like this in-repo will only help make that effort go more smoothly. Ariel Weisberg: have .02 to throw in on the topic?

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - What are we stuck on with this ticket right now? I want perf graphs for Windows testing so let's un-stick this. Given the testing improvements being worked on right now having some tool like this in-repo will only help make that effort go more smoothly. Ariel Weisberg : have .02 to throw in on the topic?
          Hide
          enigmacurry Ryan McGuire added a comment -

          I still would like to incorporate other's suggestions, but as I'm still using this patch myself, I'm updating my existing patch merged with latest trunk changes (my github branch is up to date too)

          Show
          enigmacurry Ryan McGuire added a comment - I still would like to incorporate other's suggestions, but as I'm still using this patch myself, I'm updating my existing patch merged with latest trunk changes (my github branch is up to date too)
          Hide
          jshook Jonathan Shook added a comment -

          It would be nice to have the ability to send metrics from the client to common monitoring systems. It would be especially nice if you could simply use the same reporter configuration format that you can already use for wiring Cassandra to other monitoring systems, like graphite. For serious users of stress, this would be the preferred approach to capturing results.

          Show
          jshook Jonathan Shook added a comment - It would be nice to have the ability to send metrics from the client to common monitoring systems. It would be especially nice if you could simply use the same reporter configuration format that you can already use for wiring Cassandra to other monitoring systems, like graphite. For serious users of stress, this would be the preferred approach to capturing results.
          Hide
          enigmacurry Ryan McGuire added a comment -

          fwiw, I've updated my branch again to fix the case where you run without threadcounts specified. It automatically breaks it out into multiple runs with " - X threads" appended to the revision name.

          example: http://ryanmcguire.info/ds/jira/7918-multi-threads.html

          I'll give your comments some more thought Benedict, thanks.

          Show
          enigmacurry Ryan McGuire added a comment - fwiw, I've updated my branch again to fix the case where you run without threadcounts specified. It automatically breaks it out into multiple runs with " - X threads" appended to the revision name. example: http://ryanmcguire.info/ds/jira/7918-multi-threads.html I'll give your comments some more thought Benedict , thanks.
          Hide
          benedict Benedict added a comment -

          My plane journey was spent manically trying various graphing options to give everything you need to assess a branch in one view, and clearly. I'd hate that to go to waste. The new patch as it stands only produces the graphs we've always got - I'd like to see cstar and our bundled tool produce better graphs. Each one of the graphs in the gnuplot output is designed to let you see more information; it's all normalised, coloured and scattered so you can distinguish the results at each moment in time and overall. Too often with the web output I have to simply glance at the "average" to tell what's going on (or guess-and-peck numbers for zooming in), and have to click at each different stat which is laborious (and, let's be honest, we don't do it thoroughly, we just peck at a few... or perhaps I'm lazier than everyone else )

          To elaborate on the alternative, there are ten graphs in one view in the gnuplot version, scaled so you can tell everything they want you to know without clicking once. The left-most of each graph normalises each moment of each run against the base run, so that variability can be easily broken down across the run. The middle graph plots the raw data so you can get a feel for its shape, and the final graph plots the median, quartiles and deciles. The latencies are all plotted with selected scatters / lines to make distinguishing which p-range we're looking at, even when they cross. GC is also plotted specially as a cumulative run, since this tweaks out differences much more clearly also.

          I have nothing against discarding the gnuplot approach, but I'd like to see whatever solution we produce deliver really great graphs that allow us to make decisions more easily and more accurately. Right now I'd prefer to put the gnuplot work into cstar than the other way around. Though I can tell the hatred for it runs deep!

          Show
          benedict Benedict added a comment - My plane journey was spent manically trying various graphing options to give everything you need to assess a branch in one view, and clearly. I'd hate that to go to waste. The new patch as it stands only produces the graphs we've always got - I'd like to see cstar and our bundled tool produce better graphs . Each one of the graphs in the gnuplot output is designed to let you see more information; it's all normalised, coloured and scattered so you can distinguish the results at each moment in time and overall. Too often with the web output I have to simply glance at the "average" to tell what's going on (or guess-and-peck numbers for zooming in), and have to click at each different stat which is laborious (and, let's be honest, we don't do it thoroughly, we just peck at a few... or perhaps I'm lazier than everyone else ) To elaborate on the alternative, there are ten graphs in one view in the gnuplot version, scaled so you can tell everything they want you to know without clicking once. The left-most of each graph normalises each moment of each run against the base run, so that variability can be easily broken down across the run. The middle graph plots the raw data so you can get a feel for its shape, and the final graph plots the median, quartiles and deciles. The latencies are all plotted with selected scatters / lines to make distinguishing which p-range we're looking at, even when they cross. GC is also plotted specially as a cumulative run, since this tweaks out differences much more clearly also. I have nothing against discarding the gnuplot approach, but I'd like to see whatever solution we produce deliver really great graphs that allow us to make decisions more easily and more accurately. Right now I'd prefer to put the gnuplot work into cstar than the other way around. Though I can tell the hatred for it runs deep!
          Hide
          enigmacurry Ryan McGuire added a comment - - edited

          I've made more progress on this, everything from my spec above is now implemented on my branch.

          One bug so far is when you don't specify a thread count for reads, it doesn't know how to parse the multiple iterations at different thread counts (it mashes them all together into one graph). As long as you specify the thread count your fine.

          Here's a test I ran:

          cassandra-stress write n=1000000 -rate threads=100 -graph file=test7918.html title=test revision=test1
          cassandra-stress read n=1000000 -rate threads=100 -graph file=test7918.html title=test revision=test1
          
          cassandra-stress write n=1000000 -rate threads=100 -graph file=test7918.html title=test revision=test2
          cassandra-stress read n=1000000 -rate threads=100 -graph file=test7918.html title=test revision=test2
          
          cassandra-stress write n=1000000 -rate threads=100 -graph file=test7918.html title=test revision=test3
          cassandra-stress read n=1000000 -rate threads=100 -graph file=test7918.html title=test revision=test3
          

          And it produced this single 282K html file:

          http://ryanmcguire.info/ds/jira/test7918.html

          Show
          enigmacurry Ryan McGuire added a comment - - edited I've made more progress on this, everything from my spec above is now implemented on my branch . One bug so far is when you don't specify a thread count for reads, it doesn't know how to parse the multiple iterations at different thread counts (it mashes them all together into one graph). As long as you specify the thread count your fine. Here's a test I ran: cassandra-stress write n=1000000 -rate threads=100 -graph file=test7918.html title=test revision=test1 cassandra-stress read n=1000000 -rate threads=100 -graph file=test7918.html title=test revision=test1 cassandra-stress write n=1000000 -rate threads=100 -graph file=test7918.html title=test revision=test2 cassandra-stress read n=1000000 -rate threads=100 -graph file=test7918.html title=test revision=test2 cassandra-stress write n=1000000 -rate threads=100 -graph file=test7918.html title=test revision=test3 cassandra-stress read n=1000000 -rate threads=100 -graph file=test7918.html title=test revision=test3 And it produced this single 282K html file: http://ryanmcguire.info/ds/jira/test7918.html
          Hide
          enigmacurry Ryan McGuire added a comment -

          I got pretty far with this before I got sidetracked again.

          https://github.com/EnigmaCurry/cassandra/tree/7918-stress-graph

          This implements the command line interface, captures the metrics to a temporary file, and embeds the graphing javascript as a java resource. What's still left to do: dump the metrics into the html file, and handle the merge case where the html file already exists.

          Show
          enigmacurry Ryan McGuire added a comment - I got pretty far with this before I got sidetracked again. https://github.com/EnigmaCurry/cassandra/tree/7918-stress-graph This implements the command line interface, captures the metrics to a temporary file, and embeds the graphing javascript as a java resource. What's still left to do: dump the metrics into the html file, and handle the merge case where the html file already exists.
          Hide
          jbellis Jonathan Ellis added a comment -

          Sounds pretty clean to me.

          Show
          jbellis Jonathan Ellis added a comment - Sounds pretty clean to me.
          Hide
          enigmacurry Ryan McGuire added a comment -

          Here's how I'm planning on attacking this

          • I've refactored my original graph tool from cstar_perf into a single HTML file (includes embedded jQuery, d3js, and other dependencies.) - example here
          • Add a new option to stress called -graph
            • -graph takes a required parameter file= to specify the HTML report file to generate.
            • Stress will record the metrics it collects (intervals) to a temporary file.
            • When done, it will write out the HTML with the intervals converted into a JSON format embedded inside.
            • If the file already exists, it will first load the existing JSON data and merge it with the new data (which adds these new operations to the dropdown as it goes.)
            • Originally, this tool was meant to compare two or more different configurations, so we can retain that behavior by adding an optional parameter to -graph called revision=. Revision is just a unique name to give to your particular cluster configuration. This revision can be reused on multiple runs and a new metrics will be merged into the JSON for that revision as described above.
            • Cleanup of temporary metric file.

          Example to run a write, then a read, on two different clusters, generating a single report to compare the two:

          # First cluster:
          cassandra-stress write n=19000000 -node cluster1 -graph file=compare_clusters.html revision=cluster1
          cassandra-stress read n=19000000 -node cluster1 -graph file=compare_clusters.html revision=cluster1
          # Second cluster:
          cassandra-stress write n=19000000 -node cluster2 -graph file=compare_clusters.html revision=cluster2
          cassandra-stress read n=19000000 -node cluster2 -graph file=compare_clusters.html revision=cluster2
          

          The resulting compare_clusters.html will contain all four runs and the file can be loaded into any modernish web browser, no webserver, nor internet connection required.

          Show
          enigmacurry Ryan McGuire added a comment - Here's how I'm planning on attacking this I've refactored my original graph tool from cstar_perf into a single HTML file (includes embedded jQuery, d3js, and other dependencies.) - example here Add a new option to stress called -graph -graph takes a required parameter file= to specify the HTML report file to generate. Stress will record the metrics it collects (intervals) to a temporary file. When done, it will write out the HTML with the intervals converted into a JSON format embedded inside. If the file already exists, it will first load the existing JSON data and merge it with the new data (which adds these new operations to the dropdown as it goes.) Originally, this tool was meant to compare two or more different configurations, so we can retain that behavior by adding an optional parameter to -graph called revision= . Revision is just a unique name to give to your particular cluster configuration. This revision can be reused on multiple runs and a new metrics will be merged into the JSON for that revision as described above. Cleanup of temporary metric file. Example to run a write, then a read, on two different clusters, generating a single report to compare the two: # First cluster: cassandra-stress write n=19000000 -node cluster1 -graph file=compare_clusters.html revision=cluster1 cassandra-stress read n=19000000 -node cluster1 -graph file=compare_clusters.html revision=cluster1 # Second cluster: cassandra-stress write n=19000000 -node cluster2 -graph file=compare_clusters.html revision=cluster2 cassandra-stress read n=19000000 -node cluster2 -graph file=compare_clusters.html revision=cluster2 The resulting compare_clusters.html will contain all four runs and the file can be loaded into any modernish web browser, no webserver, nor internet connection required.
          Hide
          enigmacurry Ryan McGuire added a comment -

          Pretty easily. Right now it's written such that it needs to do an AJAX request to get the data to plot. I usually just run it with 'python -m SimpleHTTPServer', but if we wanted it to be even slicker than that we could have the tool embed the data and javascript directly into a dynamically generated html file which could be loaded in a browser without needing the server component.

          Also, it can plot just about any time based metric, not just stress. I've successfully used it to plot results from YCSB for instance.

          Show
          enigmacurry Ryan McGuire added a comment - Pretty easily. Right now it's written such that it needs to do an AJAX request to get the data to plot. I usually just run it with 'python -m SimpleHTTPServer', but if we wanted it to be even slicker than that we could have the tool embed the data and javascript directly into a dynamically generated html file which could be loaded in a browser without needing the server component. Also, it can plot just about any time based metric, not just stress. I've successfully used it to plot results from YCSB for instance.
          Hide
          jbellis Jonathan Ellis added a comment -

          The more I think about it, the less of a fan I am about maintaining two separate stress-graphing code bases.

          Ryan McGuire, how hard would it be to extract cstar's graph generator into something we could call from the commandline?

          Show
          jbellis Jonathan Ellis added a comment - The more I think about it, the less of a fan I am about maintaining two separate stress-graphing code bases. Ryan McGuire , how hard would it be to extract cstar's graph generator into something we could call from the commandline?
          Hide
          benedict Benedict added a comment -

          Well, I developed this on 12-hr flight, so doing it with something I need Google to achieve wasn't an option, although d3.js has some strengths. I really dislike python, however, and have found every time I try to use a scripting language to develop a tool they would be considered more suitable for, I waste time doing so. I am very productive in Java. I did not want to spend longer on this than necessary, so I stuck with Java this time.

          I'm very happy with the end result - these graphs are extremely informative. With a single glance a lot of comparisons can easily be made. If somebody wants to develop something of equal utility with different tools, that's fine by me, but in the mean time I don't see why that should prevent this being made use of. Since this is an adhoc tool, dropping it in favour of a future improvement is easily done, it's a non-breaking change.

          Show
          benedict Benedict added a comment - Well, I developed this on 12-hr flight, so doing it with something I need Google to achieve wasn't an option, although d3.js has some strengths. I really dislike python, however, and have found every time I try to use a scripting language to develop a tool they would be considered more suitable for, I waste time doing so. I am very productive in Java. I did not want to spend longer on this than necessary, so I stuck with Java this time. I'm very happy with the end result - these graphs are extremely informative. With a single glance a lot of comparisons can easily be made. If somebody wants to develop something of equal utility with different tools, that's fine by me, but in the mean time I don't see why that should prevent this being made use of. Since this is an adhoc tool, dropping it in favour of a future improvement is easily done, it's a non-breaking change.
          Hide
          brandon.williams Brandon Williams added a comment -

          gnuplot in general is awful. Ryan McGuire has that tool that does cool rendering using D3, maybe we could just drop html with the json data embedded and let js do the rest

          Show
          brandon.williams Brandon Williams added a comment - gnuplot in general is awful. Ryan McGuire has that tool that does cool rendering using D3, maybe we could just drop html with the json data embedded and let js do the rest
          Hide
          jbellis Jonathan Ellis added a comment -

          Gut reaction: driving gnuplot with java is kind of awful, wouldn't this be better done in Python?

          Show
          jbellis Jonathan Ellis added a comment - Gut reaction: driving gnuplot with java is kind of awful, wouldn't this be better done in Python?
          Hide
          benedict Benedict added a comment -

          Patch available here

          See CASSANDRA-7282 for sample output. This patch relies upon gnuplot.

          Show
          benedict Benedict added a comment - Patch available here See CASSANDRA-7282 for sample output. This patch relies upon gnuplot.

            People

            • Assignee:
              enigmacurry Ryan McGuire
              Reporter:
              benedict Benedict
              Reviewer:
              Joshua McKenzie
            • Votes:
              2 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development