Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 3.0 alpha 1
    • Component/s: Configuration
    • Labels:
      None

      Description

      See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281

      May want to default 2.1 to G1.

      2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.)

        Issue Links

          Activity

          Hide
          jbellis Jonathan Ellis added a comment -
          Show
          jbellis Jonathan Ellis added a comment - /cc Rick Branson Benedict
          Hide
          benedict Benedict added a comment -

          We need to make sure we test over an extended period with a variety of operations being exercised against the cluster. This is probably a good opportunity to try and define a real world burn in test as well, and what parameters should be included.

          Some things to consider:

          1. Range of data distributions, including (esp. for this) large partitions and very large cells. Possibly run two or three parallel stress profiles with very different data profiles to really give GC a headache dealing with different velocities / lifetimes.
          2. Incremental and full repairs
          3. Hint accumulation / node death
          4. Tombstones / Range Tombstones
          5. Secondary indexes?

          I'd suggest ignoring some variables, and e.g. stick with just netty, so we can define a single complex workload and run it for an extended period and get a good result. While our client buffers behave quite differently with each, I'm happy tuning defaults for native now it's faster.

          It might also be useful, for this test only, to see for a single node how well the two degrade as heap pressure increases, by artificially consuming large portions of the heap for the duration of a more simple stress test.

          Show
          benedict Benedict added a comment - We need to make sure we test over an extended period with a variety of operations being exercised against the cluster. This is probably a good opportunity to try and define a real world burn in test as well, and what parameters should be included. Some things to consider: Range of data distributions, including (esp. for this) large partitions and very large cells. Possibly run two or three parallel stress profiles with very different data profiles to really give GC a headache dealing with different velocities / lifetimes. Incremental and full repairs Hint accumulation / node death Tombstones / Range Tombstones Secondary indexes? I'd suggest ignoring some variables, and e.g. stick with just netty, so we can define a single complex workload and run it for an extended period and get a good result. While our client buffers behave quite differently with each, I'm happy tuning defaults for native now it's faster. It might also be useful, for this test only, to see for a single node how well the two degrade as heap pressure increases, by artificially consuming large portions of the heap for the duration of a more simple stress test.
          Show
          tjake T Jake Luciani added a comment - See also https://software.intel.com/en-us/blogs/2014/06/18/part-1-tuning-java-garbage-collection-for-hbase 100gb heaps!
          Hide
          rbranson Rick Branson added a comment -

          Mad anecdotes:

          We ran with G1 enabled for around 4 days in a 33-node cluster running 1.2.17 on JDK7u45 that has around a 1:5 read:write ratio. We tried a few different configurations with short durations, but most of the time we ran it with the out-of-the-box G1 configuration on a 20G heap and 32 parallel GC threads (16 core, 32 hyperthreaded). There were some somewhat scary bugs fixed in 7u60 that ultimately caused me to roll back to the CMS collector after the experiment.

          • The experiment pointed out that our young gen was basically too small and was pulling latency up significantly. When we returned back to CMS, I doubled new size from 800M -> 1600M. We had moved to new hardware and hadn't taken the time to sit down and play with GC settings. This cut our mean latency dramatically as perceived from the client, ~50% for writes and ~30% for reads, similar to what we saw with G1. I was quite thrilled with this result.
          • I tried both 100ms and 150ms pause times targets with 12G, 16G, and 20G heaps, and while these resulted in slightly lower mean latency (~5-10%), Mixed GC activity caused P99s to suffer greatly. There's compelling evidence that the 200ms default is nearly ideal for the way the G1 algorithm works in its current incarnation.
          • We basically needed a 20G heap to make G1 work well for us, since by default G1 will use up to half of the max heap for eden space and Cassandra needs quite a large old gen to stay happy. G1 appears to need a much larger eden space to work efficiently, sizes that would make ParNew die in a fire. GCs of the eden space were impressively fast, with a ~10G eden space taking ~120ms on average to collect.
          • G1's huge eden space was helpful working around some issues with compaction on hints CF which had dozens of very wide partitions, hundreds of thousands of cells each.
          • Overall, at the default 200ms pause time target, we didn't see much of an increase in CPU usage over CMS.

          In the end, my tests basically told us that G1 requires a larger heap to get the same results with far less tuning. If there are GC issues, it seems like in the vast majority of cases G1 can either eliminate them or G1 makes it easy to just workaround them by cranking up the heap size. Someone should probably test G1 with a variable-sized heap since it's designed to give back RAM when it thinks it doesn't need it. That might or might not actually work. While we didn't test this, a configuration of G1 + heap size min of 1/8 RAM and max of 1/2 RAM might make a really nice default for Cassandra at some point.

          Show
          rbranson Rick Branson added a comment - Mad anecdotes: We ran with G1 enabled for around 4 days in a 33-node cluster running 1.2.17 on JDK7u45 that has around a 1:5 read:write ratio. We tried a few different configurations with short durations, but most of the time we ran it with the out-of-the-box G1 configuration on a 20G heap and 32 parallel GC threads (16 core, 32 hyperthreaded). There were some somewhat scary bugs fixed in 7u60 that ultimately caused me to roll back to the CMS collector after the experiment. The experiment pointed out that our young gen was basically too small and was pulling latency up significantly. When we returned back to CMS, I doubled new size from 800M -> 1600M. We had moved to new hardware and hadn't taken the time to sit down and play with GC settings. This cut our mean latency dramatically as perceived from the client, ~50% for writes and ~30% for reads, similar to what we saw with G1. I was quite thrilled with this result. I tried both 100ms and 150ms pause times targets with 12G, 16G, and 20G heaps, and while these resulted in slightly lower mean latency (~5-10%), Mixed GC activity caused P99s to suffer greatly. There's compelling evidence that the 200ms default is nearly ideal for the way the G1 algorithm works in its current incarnation. We basically needed a 20G heap to make G1 work well for us, since by default G1 will use up to half of the max heap for eden space and Cassandra needs quite a large old gen to stay happy. G1 appears to need a much larger eden space to work efficiently, sizes that would make ParNew die in a fire. GCs of the eden space were impressively fast, with a ~10G eden space taking ~120ms on average to collect. G1's huge eden space was helpful working around some issues with compaction on hints CF which had dozens of very wide partitions, hundreds of thousands of cells each. Overall, at the default 200ms pause time target, we didn't see much of an increase in CPU usage over CMS. In the end, my tests basically told us that G1 requires a larger heap to get the same results with far less tuning. If there are GC issues, it seems like in the vast majority of cases G1 can either eliminate them or G1 makes it easy to just workaround them by cranking up the heap size. Someone should probably test G1 with a variable-sized heap since it's designed to give back RAM when it thinks it doesn't need it. That might or might not actually work. While we didn't test this, a configuration of G1 + heap size min of 1/8 RAM and max of 1/2 RAM might make a really nice default for Cassandra at some point.
          Hide
          jeromatron Jeremy Hanna added a comment -

          Any update on this testing Shawn Kumar? Just wondered as this ticket seemed promising initially but hasn't been updated in some time.

          Show
          jeromatron Jeremy Hanna added a comment - Any update on this testing Shawn Kumar ? Just wondered as this ticket seemed promising initially but hasn't been updated in some time.
          Hide
          atobey@datastax.com Albert P Tobey added a comment -

          I managed to get G1 (Java 8) to beat CMS on both latency and throughput on my NUC cluster.

          Preliminary results: https://gist.github.com/tobert/ea9328e4873441c7fc34

          Show
          atobey@datastax.com Albert P Tobey added a comment - I managed to get G1 (Java 8) to beat CMS on both latency and throughput on my NUC cluster. Preliminary results: https://gist.github.com/tobert/ea9328e4873441c7fc34
          Hide
          snazy Robert Stupp added a comment -

          Nice

          Looking forward to see Oracle JVM and C* 2.1 results. (TBH, I don't expect much difference between OpenJDK8 and Oracle JDK8)

          But more interesting would be how G1 behaves w/ read and mixed workloads.

          Show
          snazy Robert Stupp added a comment - Nice Looking forward to see Oracle JVM and C* 2.1 results. (TBH, I don't expect much difference between OpenJDK8 and Oracle JDK8) But more interesting would be how G1 behaves w/ read and mixed workloads.
          Hide
          atobey@datastax.com Albert P Tobey added a comment -

          So far my testing of read workloads matches my experience with writes. An 8GB heap with generic G1GC settings is "good" for more workloads out of the box than haphazardly tuned CMS can be. I've been testing on a mix of Oracle/OpenJDK and JDK7/8 and the results are fairly consistent across the board with the exception that performance is a tad higher (~5%) on JDK8 than JDK7 (with G1GC - I have not tested CMS much on JDK8).

          These parameters get better throughput than CMS out of the box with significantly improved consistency in the max and p99.9 latency.

          -Xmx8G -Xms8G -XX:+UseG1GC

          If throughput is more critical than latency, the following will get a few % more throughput at the cost of potentially higher max pause times:

          -Xmx8G -Xms8G -XX:+UseG1GC -XX:MaxGCPauseMillis=2000 -XX:InitiatingHeapOccupancyPercent=75

          My recommendation is to document the last two options in cassandra-env.sh but leave them disabled/commented out for end-users to fiddle with. Other knobs for G1 didn't make a statistically measurable difference in my observations.

          G1 scales particularly well with heap size on huge machines. 8 to 16GB doesn't seem to make a big difference, matching what Rick Branson saw. At 24GB I started seeing about 8-10% throughput increase with little variance in pause times.

          IMO the simple G1 configuration should be the default for large heaps. It's simple and provides consistent latency. Because it uses heuristics to determine the eden size and scanning schedule, it will adapts well to diverse environments without tweaking. Heap sizes under 8GB should continue to use CMS or even experiment with serial collectors (e.g. Raspberry Pi, t2.micro, vagrant). If there is interest, I will write up a patch for cassandra-env.sh to make the auto-detection code pick G1GC at >= 6GB heap and CMS for < 6GB.

          Show
          atobey@datastax.com Albert P Tobey added a comment - So far my testing of read workloads matches my experience with writes. An 8GB heap with generic G1GC settings is "good" for more workloads out of the box than haphazardly tuned CMS can be. I've been testing on a mix of Oracle/OpenJDK and JDK7/8 and the results are fairly consistent across the board with the exception that performance is a tad higher (~5%) on JDK8 than JDK7 (with G1GC - I have not tested CMS much on JDK8). These parameters get better throughput than CMS out of the box with significantly improved consistency in the max and p99.9 latency. -Xmx8G -Xms8G -XX:+UseG1GC If throughput is more critical than latency, the following will get a few % more throughput at the cost of potentially higher max pause times: -Xmx8G -Xms8G -XX:+UseG1GC -XX:MaxGCPauseMillis=2000 -XX:InitiatingHeapOccupancyPercent=75 My recommendation is to document the last two options in cassandra-env.sh but leave them disabled/commented out for end-users to fiddle with. Other knobs for G1 didn't make a statistically measurable difference in my observations. G1 scales particularly well with heap size on huge machines. 8 to 16GB doesn't seem to make a big difference, matching what Rick Branson saw. At 24GB I started seeing about 8-10% throughput increase with little variance in pause times. IMO the simple G1 configuration should be the default for large heaps. It's simple and provides consistent latency. Because it uses heuristics to determine the eden size and scanning schedule, it will adapts well to diverse environments without tweaking. Heap sizes under 8GB should continue to use CMS or even experiment with serial collectors (e.g. Raspberry Pi, t2.micro, vagrant). If there is interest, I will write up a patch for cassandra-env.sh to make the auto-detection code pick G1GC at >= 6GB heap and CMS for < 6GB.
          Hide
          jbellis Jonathan Ellis added a comment -

          Albert P Tobey wdyt of mstump's suggestions at CASSANDRA-8150?

          Show
          jbellis Jonathan Ellis added a comment - Albert P Tobey wdyt of mstump's suggestions at CASSANDRA-8150 ?
          Hide
          jbellis Jonathan Ellis added a comment -

          Also, how much better is CMS for small heaps? Given that sub-8GB heaps aren't particularly common or recommended, can we just simplify it to "use G1?"

          Show
          jbellis Jonathan Ellis added a comment - Also, how much better is CMS for small heaps? Given that sub-8GB heaps aren't particularly common or recommended, can we just simplify it to "use G1?"
          Hide
          atobey@datastax.com Albert P Tobey added a comment -

          I'll kick off some tests and find out. All of the Oracle docs say not to bother below 6GB, but yeah I agree, if it's basically not bad we should go with simple.

          Show
          atobey@datastax.com Albert P Tobey added a comment - I'll kick off some tests and find out. All of the Oracle docs say not to bother below 6GB, but yeah I agree, if it's basically not bad we should go with simple.
          Hide
          atobey@datastax.com Albert P Tobey added a comment -

          This is with 2.0 / OpenJDK8 since that's what I had running. Same everything each run except for heap size. cassandra-stress 2.1.4 read workload / 800 threads. I'll re-run with 2.1 / Oracle JDK8 and some mixed load.

          -XX:+UseG1GC

          Also: -XX:+UseTLAB -XX:+ResizeTLAB -XX:-UseBiasedLocking -XX:+AlwaysPreTouch but maybe those should go in a different ticket.

          8GB:

          op rate : 139805
          partition rate : 139805
          row rate : 139805
          latency mean : 5.7
          latency median : 4.2
          latency 95th percentile : 13.2
          latency 99th percentile : 18.5
          latency 99.9th percentile : 21.1
          latency max : 303.8

          512MB:

          op rate : 114214
          partition rate : 114214
          row rate : 114214
          latency mean : 7.0
          latency median : 3.7
          latency 95th percentile : 12.4
          latency 99th percentile : 14.7
          latency 99.9th percentile : 15.3
          latency max : 307.1

          256MB:

          op rate : 60028
          partition rate : 60028
          row rate : 60028
          latency mean : 13.3
          latency median : 4.0
          latency 95th percentile : 44.7
          latency 99th percentile : 73.5
          latency 99.9th percentile : 79.6
          latency max : 1105.4

          Same everything with mostly stock CMS settings for 2.0. I added the -XX:+UseTLAB -XX:+ResizeTLAB -XX:-UseBiasedLocking -XX:+AlwaysPreTouch settings to keep the numbers comparable to all of my other data.

          8GB/1GB:

          op rate : 119155
          partition rate : 119155
          row rate : 119155
          latency mean : 6.7
          latency median : 4.1
          latency 95th percentile : 11.8
          latency 99th percentile : 15.5
          latency 99.9th percentile : 17.3
          latency max : 520.2

          512MB ( -XX:+UseAdaptiveSizePolicy):

          op rate : 82375
          partition rate : 82375
          row rate : 82375
          latency mean : 9.7
          latency median : 4.3
          latency 95th percentile : 28.2
          latency 99th percentile : 49.4
          latency 99.9th percentile : 54.8
          latency max : 2642.6

          256MB ( -XX:+UseAdaptiveSizePolicy):

          op rate : 77705
          partition rate : 77705
          row rate : 77705
          latency mean : 10.3
          latency median : 4.8
          latency 95th percentile : 33.6
          latency 99th percentile : 45.3
          latency 99.9th percentile : 49.1
          latency max : 1990.0

          Show
          atobey@datastax.com Albert P Tobey added a comment - This is with 2.0 / OpenJDK8 since that's what I had running. Same everything each run except for heap size. cassandra-stress 2.1.4 read workload / 800 threads. I'll re-run with 2.1 / Oracle JDK8 and some mixed load. -XX:+UseG1GC Also: -XX:+UseTLAB -XX:+ResizeTLAB -XX:-UseBiasedLocking -XX:+AlwaysPreTouch but maybe those should go in a different ticket. 8GB: op rate : 139805 partition rate : 139805 row rate : 139805 latency mean : 5.7 latency median : 4.2 latency 95th percentile : 13.2 latency 99th percentile : 18.5 latency 99.9th percentile : 21.1 latency max : 303.8 512MB: op rate : 114214 partition rate : 114214 row rate : 114214 latency mean : 7.0 latency median : 3.7 latency 95th percentile : 12.4 latency 99th percentile : 14.7 latency 99.9th percentile : 15.3 latency max : 307.1 256MB: op rate : 60028 partition rate : 60028 row rate : 60028 latency mean : 13.3 latency median : 4.0 latency 95th percentile : 44.7 latency 99th percentile : 73.5 latency 99.9th percentile : 79.6 latency max : 1105.4 Same everything with mostly stock CMS settings for 2.0. I added the -XX:+UseTLAB -XX:+ResizeTLAB -XX:-UseBiasedLocking -XX:+AlwaysPreTouch settings to keep the numbers comparable to all of my other data. 8GB/1GB: op rate : 119155 partition rate : 119155 row rate : 119155 latency mean : 6.7 latency median : 4.1 latency 95th percentile : 11.8 latency 99th percentile : 15.5 latency 99.9th percentile : 17.3 latency max : 520.2 512MB ( -XX:+UseAdaptiveSizePolicy): op rate : 82375 partition rate : 82375 row rate : 82375 latency mean : 9.7 latency median : 4.3 latency 95th percentile : 28.2 latency 99th percentile : 49.4 latency 99.9th percentile : 54.8 latency max : 2642.6 256MB ( -XX:+UseAdaptiveSizePolicy): op rate : 77705 partition rate : 77705 row rate : 77705 latency mean : 10.3 latency median : 4.8 latency 95th percentile : 33.6 latency 99th percentile : 45.3 latency 99.9th percentile : 49.1 latency max : 1990.0
          Hide
          atobey@datastax.com Albert P Tobey added a comment - - edited

          My benchmarks completed. These were run on 6 quad-core Intel NUCs with 16GB RAM / 240GB SSD / gigabit ethernet. The CPUs are fairly slow at 1.3Ghz i5-4250U. Cassandra 2.1.4 / Oracle JDK 8u40 / CoreOS 647.0.0 / Linux 3.19.3 (bare metal - no container). The tests were automated with a complete cluster rebuild between tests and caches dropped before starting Cassandra each time.

          The big win with G1 IMO is that it is auto-tuning. I've been running it on a few other kinds of machines and it generally does much better with more CPU power.

          cassandra-stress was run with an increased heap but is otherwise unmodified from Cassandra 2.1.4. I checked the gc log regularly and did not see many pauses for stress itself above 1ms here & there, with most pauses in the ~300usec range. The three stress nodes I had available are all quad-cores: i7-2600/3.4Ghz/8GB, Xeon-E31270/3.4Ghz/16GB, i5-4250U/1.3Ghz/16GB.

          These were saturation tests. In all but the G1 @ 256MB test the stress runs were stable and the systems' CPUs were at 100% pretty much the whole time. The numbers smooth out a lot for all of the combinations of GC settings at more pedestrian throughput. I will kick that off when I get a chance, which will be ~2 weeks from now.

          The final output of the stress is available here:

          https://docs.google.com/a/datastax.com/spreadsheets/d/19Eb7HGkd5rFUD_C0ZALbK6-R4fPF9vJRr8BrvxBwo38/edit?usp=sharing
          http://tobert.org/downloads/cassandra-2.1-cms-vs-g1.csv

          The stress commands, system.log, GC logs, conf directory from all the servers, and full stress logs are available on my webserver here:

          http://tobert.org/downloads/cassandra-2.1-cms-vs-g1-data.tar.gz (35MB)

          Show
          atobey@datastax.com Albert P Tobey added a comment - - edited My benchmarks completed. These were run on 6 quad-core Intel NUCs with 16GB RAM / 240GB SSD / gigabit ethernet. The CPUs are fairly slow at 1.3Ghz i5-4250U. Cassandra 2.1.4 / Oracle JDK 8u40 / CoreOS 647.0.0 / Linux 3.19.3 (bare metal - no container). The tests were automated with a complete cluster rebuild between tests and caches dropped before starting Cassandra each time. The big win with G1 IMO is that it is auto-tuning. I've been running it on a few other kinds of machines and it generally does much better with more CPU power. cassandra-stress was run with an increased heap but is otherwise unmodified from Cassandra 2.1.4. I checked the gc log regularly and did not see many pauses for stress itself above 1ms here & there, with most pauses in the ~300usec range. The three stress nodes I had available are all quad-cores: i7-2600/3.4Ghz/8GB, Xeon-E31270/3.4Ghz/16GB, i5-4250U/1.3Ghz/16GB. These were saturation tests. In all but the G1 @ 256MB test the stress runs were stable and the systems' CPUs were at 100% pretty much the whole time. The numbers smooth out a lot for all of the combinations of GC settings at more pedestrian throughput. I will kick that off when I get a chance, which will be ~2 weeks from now. The final output of the stress is available here: https://docs.google.com/a/datastax.com/spreadsheets/d/19Eb7HGkd5rFUD_C0ZALbK6-R4fPF9vJRr8BrvxBwo38/edit?usp=sharing http://tobert.org/downloads/cassandra-2.1-cms-vs-g1.csv The stress commands, system.log, GC logs, conf directory from all the servers, and full stress logs are available on my webserver here: http://tobert.org/downloads/cassandra-2.1-cms-vs-g1-data.tar.gz (35MB)
          Hide
          jbellis Jonathan Ellis added a comment -

          Sounds like G1 is finally ready to replace CMS. WDYT Matt Stump Rick Branson Ariel Weisberg?

          Show
          jbellis Jonathan Ellis added a comment - Sounds like G1 is finally ready to replace CMS. WDYT Matt Stump Rick Branson Ariel Weisberg ?
          Hide
          rbranson Rick Branson added a comment -

          I think it definitely makes sense as a default. My guess is that it'll result in fewer headaches for most people.

          Show
          rbranson Rick Branson added a comment - I think it definitely makes sense as a default. My guess is that it'll result in fewer headaches for most people.
          Hide
          aweisberg Ariel Weisberg added a comment -

          For the C* we ship today we should evaluate whether G1 is better. For the platonic ideal C* (where the heap only needs to be 1 or 2 gig) I suspect we should ship CMS because I found it has lower baseline pause times for young gen collections especially on server class hardware.

          This is something that we can do in a data driven way. Albert P Tobey got some good data, but when I sampled throughput (from the spreadsheet) on some workloads like the 12g CMS and G1 I saw more throughput under CMS. I think we should munge the data a bit and visualize throughput and P99 (or P99.9). I am also not a fan of basing the decision off of that # of cores and a non-NUMA machine which is not representative of the hardware people use.

          I am not comfortable with the measurements for large heaps because if I am reading correctly there was pretty never an old generation collection under the workload I looked at. The old gen was growing but never reached the point it needed to do an old gen GC. It's great the server can run that long with so little promotion (TIL that is a thing that happens). That explains the very long young gen pauses. Lots of survivor copying I guess when I look at the size of survivor set vs pause time. I saw young gen pauses in the 400+ millisecond range under both collectors.

          Another behavior to consider is worst case pause time when there is fragmentation.

          With all the overhead of survivor copying I start to wonder if a valid strategy would be to allow promotion and let the concurrent collector run all the time. That would bring down young-gen GC pauses in exchange for throughput.

          I think whether 8099 means no off-heap memtables in 3.0 is also a factor. If G1 scales to larger heaps and larger on heap memtables then it will be a better choice.

          Show
          aweisberg Ariel Weisberg added a comment - For the C* we ship today we should evaluate whether G1 is better. For the platonic ideal C* (where the heap only needs to be 1 or 2 gig) I suspect we should ship CMS because I found it has lower baseline pause times for young gen collections especially on server class hardware. This is something that we can do in a data driven way. Albert P Tobey got some good data, but when I sampled throughput (from the spreadsheet) on some workloads like the 12g CMS and G1 I saw more throughput under CMS. I think we should munge the data a bit and visualize throughput and P99 (or P99.9). I am also not a fan of basing the decision off of that # of cores and a non-NUMA machine which is not representative of the hardware people use. I am not comfortable with the measurements for large heaps because if I am reading correctly there was pretty never an old generation collection under the workload I looked at. The old gen was growing but never reached the point it needed to do an old gen GC. It's great the server can run that long with so little promotion (TIL that is a thing that happens). That explains the very long young gen pauses. Lots of survivor copying I guess when I look at the size of survivor set vs pause time. I saw young gen pauses in the 400+ millisecond range under both collectors. Another behavior to consider is worst case pause time when there is fragmentation. With all the overhead of survivor copying I start to wonder if a valid strategy would be to allow promotion and let the concurrent collector run all the time. That would bring down young-gen GC pauses in exchange for throughput. I think whether 8099 means no off-heap memtables in 3.0 is also a factor. If G1 scales to larger heaps and larger on heap memtables then it will be a better choice.
          Hide
          yangzhe1991 Phil Yang added a comment - - edited

          I think the default option should be prudently and carefully enough to change. Usually for C* users, it is acceptable that there is no better performance in new version. However, it may be unacceptable if the new version get a worse performance. If there is risk that in some cases G1 is worse than CMS, it may be a better choice to make G1 an optional choice first by offering another conf/cassandra-env-g1.sh file to let people have a try and don't change the default settings.

          For the tests comparing G1 and CMS, does the tests cover some extreme case? For example: bootstrap/rebuild/remove node, repair, lots of queries over tombstone_failure_threshold... And I think each test should take at lease 24 hours to have several full GCs to estimate the latency.

          Furthermore, now using CMS, we have a max heap size (8GB) limit even if the memory of this node is very large. It is still suitable for G1?

          Show
          yangzhe1991 Phil Yang added a comment - - edited I think the default option should be prudently and carefully enough to change. Usually for C* users, it is acceptable that there is no better performance in new version. However, it may be unacceptable if the new version get a worse performance. If there is risk that in some cases G1 is worse than CMS, it may be a better choice to make G1 an optional choice first by offering another conf/cassandra-env-g1.sh file to let people have a try and don't change the default settings. For the tests comparing G1 and CMS, does the tests cover some extreme case? For example: bootstrap/rebuild/remove node, repair, lots of queries over tombstone_failure_threshold... And I think each test should take at lease 24 hours to have several full GCs to estimate the latency. Furthermore, now using CMS, we have a max heap size (8GB) limit even if the memory of this node is very large. It is still suitable for G1?
          Hide
          atobey@datastax.com Albert P Tobey added a comment -

          Ariel Weisberg comparing promotions between G1 and CMS doesn't really make sense IMO. G1 promotions simply mark memory where it is without copying. After a threshold it will compact surviving cards into a single region. What I've observed is that compaction is rarely necessary with a big enough G1 heap. With a saturation write workload on Cassandra 2.1 only ~100-200MB seems to stick around for the long haul with almost all the rest getting cycled every few minutes (in an 8GB heap).

          Phil Yang I would keep the default heap at 8GB. I have tested with G1 at 16GB on a 30GB m3.2xlarge on EC2 and it generally gets better throughput and latency because there's more space for G1 to "waste" (that's what they call it). Intel tested up to 100GB with Hbase at 200ms pause target and said nice things about it. I don't see much need for C* to hit that size but it's certainly doable with G1. The main problem is smaller heaps where G1 starts to struggle a little, but I found that it still works OK down to 512MB, even if a bit less efficient than CMS since G1 targets ~10% CPU time for GC while the others target 1% by default.

          Throughput / latency is always a tradeoff and in the case of G1 with non-aggressive latency targets (-XX:MaxGCPauseMillis=2000) the throughput is darn close to CMS with considerably improved standard deviation on latency. IMO that's a great tradeoff as most of the users I talk to in the wild mostly struggle with getting reliable latency rather than throughput.

          IMO consistent performance should always take precedence over maximum performance/throughput. G1 provides a much more consistent experience with fewer knobs to mess with (especially tuning eden size, which is still a black art that nearly every installation I've looked at gets wrong).

          Show
          atobey@datastax.com Albert P Tobey added a comment - Ariel Weisberg comparing promotions between G1 and CMS doesn't really make sense IMO. G1 promotions simply mark memory where it is without copying. After a threshold it will compact surviving cards into a single region. What I've observed is that compaction is rarely necessary with a big enough G1 heap. With a saturation write workload on Cassandra 2.1 only ~100-200MB seems to stick around for the long haul with almost all the rest getting cycled every few minutes (in an 8GB heap). Phil Yang I would keep the default heap at 8GB. I have tested with G1 at 16GB on a 30GB m3.2xlarge on EC2 and it generally gets better throughput and latency because there's more space for G1 to "waste" (that's what they call it). Intel tested up to 100GB with Hbase at 200ms pause target and said nice things about it. I don't see much need for C* to hit that size but it's certainly doable with G1. The main problem is smaller heaps where G1 starts to struggle a little, but I found that it still works OK down to 512MB, even if a bit less efficient than CMS since G1 targets ~10% CPU time for GC while the others target 1% by default. Throughput / latency is always a tradeoff and in the case of G1 with non-aggressive latency targets (-XX:MaxGCPauseMillis=2000) the throughput is darn close to CMS with considerably improved standard deviation on latency. IMO that's a great tradeoff as most of the users I talk to in the wild mostly struggle with getting reliable latency rather than throughput. IMO consistent performance should always take precedence over maximum performance/throughput. G1 provides a much more consistent experience with fewer knobs to mess with (especially tuning eden size, which is still a black art that nearly every installation I've looked at gets wrong).
          Hide
          jbellis Jonathan Ellis added a comment - - edited

          > IMO consistent performance should always take precedence over maximum performance/throughput.

          Agreed. I think our bar here should be "Is G1 better for the average user" keeping in mind that the average user is a lot worse at tuning CMS than Ariel. Power users can tune for their own workload the way they always have.

          (Edit: this is not to say that more testing is or is not warranted, but let's approach this from, "if we were choosing a default today what would we do" and not "let's not switch unless we can prove that G1 is always better everywhere.")

          Show
          jbellis Jonathan Ellis added a comment - - edited > IMO consistent performance should always take precedence over maximum performance/throughput. Agreed. I think our bar here should be "Is G1 better for the average user" keeping in mind that the average user is a lot worse at tuning CMS than Ariel. Power users can tune for their own workload the way they always have. (Edit: this is not to say that more testing is or is not warranted, but let's approach this from, "if we were choosing a default today what would we do" and not "let's not switch unless we can prove that G1 is always better everywhere.")
          Hide
          yangzhe1991 Phil Yang added a comment -

          Another small question
          Since many performance improvements were made to G1 in JDK 8 and its update releases, do we need to have a propose for jdk7 users especially its early versions to update to jdk8's latest version?

          And I find a JEP that "Make G1 the default garbage collector on 32- and 64-bit server configurations." in jdk9. See http://openjdk.java.net/jeps/248 if you have not heard about it.

          Show
          yangzhe1991 Phil Yang added a comment - Another small question Since many performance improvements were made to G1 in JDK 8 and its update releases, do we need to have a propose for jdk7 users especially its early versions to update to jdk8's latest version? And I find a JEP that "Make G1 the default garbage collector on 32- and 64-bit server configurations." in jdk9. See http://openjdk.java.net/jeps/248 if you have not heard about it.
          Hide
          aweisberg Ariel Weisberg added a comment - - edited

          I would just like to see the data visualized. If it's not better in every dimension then in what dimensions is better/worse across all the data that Al collected.

          Show
          aweisberg Ariel Weisberg added a comment - - edited I would just like to see the data visualized. If it's not better in every dimension then in what dimensions is better/worse across all the data that Al collected.
          Hide
          benedict Benedict added a comment -

          if I am reading correctly there was pretty never an old generation collection under the workload I looked at. The old gen was growing but never reached the point it needed to do an old gen GC.

          Another behavior to consider is worst case pause time when there is fragmentation.

          These are concerns we should not dismiss out of hand. My concern is that these benchmarks in an idealised world of a steady rate of work production is not representative of a workload including repair, validation, long running huge compactions, hinting, periodic read/write load spikes. If these performance profiles are dependent on the memtables never being promoted, this is dependent on the disk keeping up, and under a worse but realistic workload the characteristics may be nothing like what Albert P Tobey is seeing. Changing these defaults should be done with the absolute utmost of care, and I would like to see a lot of very long running mixed workload tests including all of the other spanners in the works.

          Show
          benedict Benedict added a comment - if I am reading correctly there was pretty never an old generation collection under the workload I looked at. The old gen was growing but never reached the point it needed to do an old gen GC. Another behavior to consider is worst case pause time when there is fragmentation. These are concerns we should not dismiss out of hand. My concern is that these benchmarks in an idealised world of a steady rate of work production is not representative of a workload including repair, validation, long running huge compactions, hinting, periodic read/write load spikes. If these performance profiles are dependent on the memtables never being promoted, this is dependent on the disk keeping up, and under a worse but realistic workload the characteristics may be nothing like what Albert P Tobey is seeing. Changing these defaults should be done with the absolute utmost of care, and I would like to see a lot of very long running mixed workload tests including all of the other spanners in the works.
          Hide
          atobey@datastax.com Albert P Tobey added a comment -

          "if I am reading correctly there was pretty never an old generation collection under the workload I looked at. The old gen was growing but never reached the point it needed to do an old gen GC."

          ^ G1 doesn't work that way.

          "Another behavior to consider is worst case pause time when there is fragmentation."

          ^ G1 performs compaction. It's fairly easy to trigger and observe in gc.log with Cassandra 2.0. It takes more work with 2.1 since it seems to be easier on the GC.

          I'll see if I can find some time to generate graphs to make all this more convincing, but time is short because I'm spending all of my time tuning users' clusters where the #1 first issue every time is getting CMS to behave.

          Show
          atobey@datastax.com Albert P Tobey added a comment - "if I am reading correctly there was pretty never an old generation collection under the workload I looked at. The old gen was growing but never reached the point it needed to do an old gen GC." ^ G1 doesn't work that way. "Another behavior to consider is worst case pause time when there is fragmentation." ^ G1 performs compaction. It's fairly easy to trigger and observe in gc.log with Cassandra 2.0. It takes more work with 2.1 since it seems to be easier on the GC. I'll see if I can find some time to generate graphs to make all this more convincing, but time is short because I'm spending all of my time tuning users' clusters where the #1 first issue every time is getting CMS to behave.
          Hide
          benedict Benedict added a comment -

          ^ G1 doesn't work that way.

          While it has no "old" generation, it does promote regions and if this happens a lot you can get some weird pathological fragmentation. Now, my experience with G1 is out of date, and I haven't kept up at all with its latest behaviours, but I saw some really atrocious behaviour on very simple benchmarks a few years back. At the time, If you modified references that were randomly distributed around the heap, it required traversing a majority of the heap to collect very little, and essentially thrashed. I realise it has improved, but I do not know in what ways, and so I'm wary of making it the default without being certain it no longer has pathological cases that are a problem for us. Unless we stress the collector so that it exercises its suboptimal characteristics, I am not really super confident. I hope this is simply overly cautious, but we know of users who also had serious problems with sudden degradation despite looking good in initial testing, and it would be great for that not to be a widespread problem.

          Show
          benedict Benedict added a comment - ^ G1 doesn't work that way. While it has no "old" generation, it does promote regions and if this happens a lot you can get some weird pathological fragmentation. Now, my experience with G1 is out of date, and I haven't kept up at all with its latest behaviours, but I saw some really atrocious behaviour on very simple benchmarks a few years back. At the time, If you modified references that were randomly distributed around the heap, it required traversing a majority of the heap to collect very little, and essentially thrashed. I realise it has improved, but I do not know in what ways, and so I'm wary of making it the default without being certain it no longer has pathological cases that are a problem for us. Unless we stress the collector so that it exercises its suboptimal characteristics, I am not really super confident. I hope this is simply overly cautious, but we know of users who also had serious problems with sudden degradation despite looking good in initial testing, and it would be great for that not to be a widespread problem.
          Hide
          aweisberg Ariel Weisberg added a comment -

          ^ G1 doesn't work that way.

          I am talking about CMS. When I looked at the 12 gigabyte heap the old gen grew to 4.1 gigabytes and I didn't see any point that it shrunk.

          I'm spending all of my time tuning users' clusters where the #1 first issue every time is getting CMS to behave.

          We can make the case for G1 in different ways. If we want to do it based on real world results that is fine with me.

          To Benedict's point I think looking at all the operations we care about on realistic time scales is something we would have to do to really know what the differences are. I wish we had this stuff in CI so it would just be a matter of changing the flags, but we aren't there yet.

          Show
          aweisberg Ariel Weisberg added a comment - ^ G1 doesn't work that way. I am talking about CMS. When I looked at the 12 gigabyte heap the old gen grew to 4.1 gigabytes and I didn't see any point that it shrunk. I'm spending all of my time tuning users' clusters where the #1 first issue every time is getting CMS to behave. We can make the case for G1 in different ways. If we want to do it based on real world results that is fine with me. To Benedict's point I think looking at all the operations we care about on realistic time scales is something we would have to do to really know what the differences are. I wish we had this stuff in CI so it would just be a matter of changing the flags, but we aren't there yet.
          Hide
          atobey@datastax.com Albert P Tobey added a comment -

          I only started messing with G1 this year, so I only know the old behavior by lore I've read and heard. I have not observed significant problems it in the ~20-40 hours I've spent tuning clusters with G1 recently.

          I don't recommend anyone try G1 on JDK 7 < u75 or JDK 8 < u40 (although it's probably OK down to u20 according to the docs I've read). I did some testing on JDK7u75 and it was stable but didn't spend much time on it since JDK8u40 gave a nice bump in performance (5-10% on a customer cluster) by just switching JDKs and nothing else.

          From what I've read about the reference clearing issues, there is a new-ish setting to enable parallel reference collection, -XX:+ParallelRefProcEnabled. The advice in the docs is to only turn it on if a significant amount of time is spent on RefProc collection, e.g. " [Ref Proc: 5.2 ms]". I pulled that from a log I had handy and that is high enough that we might want to consider enabling the flag, but in most of my observations it hovers around 0.1ms under saturation load.

          Show
          atobey@datastax.com Albert P Tobey added a comment - I only started messing with G1 this year, so I only know the old behavior by lore I've read and heard. I have not observed significant problems it in the ~20-40 hours I've spent tuning clusters with G1 recently. I don't recommend anyone try G1 on JDK 7 < u75 or JDK 8 < u40 (although it's probably OK down to u20 according to the docs I've read). I did some testing on JDK7u75 and it was stable but didn't spend much time on it since JDK8u40 gave a nice bump in performance (5-10% on a customer cluster) by just switching JDKs and nothing else. From what I've read about the reference clearing issues, there is a new-ish setting to enable parallel reference collection, -XX:+ParallelRefProcEnabled. The advice in the docs is to only turn it on if a significant amount of time is spent on RefProc collection, e.g. " [Ref Proc: 5.2 ms] ". I pulled that from a log I had handy and that is high enough that we might want to consider enabling the flag, but in most of my observations it hovers around 0.1ms under saturation load.
          Hide
          benedict Benedict added a comment -

          there is a new-ish setting to enable parallel reference collection

          Throughput was something like 5% of other collectors in my testing, so parallelizing this would only help so much!

          My point is, we don't really fully understand G1, and unless we undertake a research project to fully understand its pathological cases, and how they compare/contrast to its history, I'd prefer we ensured it behaved under complex loads, and not just under isolated read or write loading.

          Show
          benedict Benedict added a comment - there is a new-ish setting to enable parallel reference collection Throughput was something like 5% of other collectors in my testing, so parallelizing this would only help so much! My point is, we don't really fully understand G1, and unless we undertake a research project to fully understand its pathological cases, and how they compare/contrast to its history, I'd prefer we ensured it behaved under complex loads, and not just under isolated read or write loading.
          Hide
          mstump Matt Stump added a comment -

          Before we talk about changing the defaults I would like to see tests run on something more representative of customer hardware. At the very least we should be doing comparisons of CMS vs G1 for different workloads on cstar. I did some initial testing and didn't see a huge benefit, but I very well could have been doing something wrong. I'm both hopeful and skeptical.

          Show
          mstump Matt Stump added a comment - Before we talk about changing the defaults I would like to see tests run on something more representative of customer hardware. At the very least we should be doing comparisons of CMS vs G1 for different workloads on cstar. I did some initial testing and didn't see a huge benefit, but I very well could have been doing something wrong. I'm both hopeful and skeptical.
          Hide
          michael.perrone Michael Perrone added a comment - - edited

          I have done extensive load testing with G1GC with Java 1.7_80 and Cassandra 2.0.12.x versions with solr secondary indexes and 20GB max heap. On 8 core systems these options were the sweet spot for the test workload and worked out well in a production cluster, providing dramatic improvements in overall GC time and eliminating long CMS pauses that we could not tune out. I will try to attach some graphs/tables/metrics in another comment.

          JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
          # set these to the number of cores
          JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=8"
          JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=8"
          JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
          # default is 10, this makes G1 slightly more aggressive
          # by starting the marking cycle earlier
          # in order to avoid evacuation failure (OOM)
          JVM_OPTS="$JVM_OPTS -XX:G1ReservePercent=15"
          # default is 45, we should start sooner.
          # in a high write large heap (20GB) this was
          # found to eliminate Old gen pauses
          JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"
          # default is 200 there is a tradeoff between latency
          # and throughput. increase this to increase throughput
          # at the cost of potential latency, up to 1000
          JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
          # use largest possible region size
          # speeds up marking phase, tradeoff is efficiency
          # comment out to let JVM decide the size
          JVM_OPTS="$JVM_OPTS -XX:G1HeapRegionSize=32"
          
          Show
          michael.perrone Michael Perrone added a comment - - edited I have done extensive load testing with G1GC with Java 1.7_80 and Cassandra 2.0.12.x versions with solr secondary indexes and 20GB max heap. On 8 core systems these options were the sweet spot for the test workload and worked out well in a production cluster, providing dramatic improvements in overall GC time and eliminating long CMS pauses that we could not tune out. I will try to attach some graphs/tables/metrics in another comment. JVM_OPTS= "$JVM_OPTS -XX:+UseG1GC" # set these to the number of cores JVM_OPTS= "$JVM_OPTS -XX:ConcGCThreads=8" JVM_OPTS= "$JVM_OPTS -XX:ParallelGCThreads=8" JVM_OPTS= "$JVM_OPTS -XX:SurvivorRatio=8" # default is 10, this makes G1 slightly more aggressive # by starting the marking cycle earlier # in order to avoid evacuation failure (OOM) JVM_OPTS= "$JVM_OPTS -XX:G1ReservePercent=15" # default is 45, we should start sooner. # in a high write large heap (20GB) this was # found to eliminate Old gen pauses JVM_OPTS= "$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25" # default is 200 there is a tradeoff between latency # and throughput. increase this to increase throughput # at the cost of potential latency, up to 1000 JVM_OPTS= "$JVM_OPTS -XX:MaxGCPauseMillis=500" # use largest possible region size # speeds up marking phase, tradeoff is efficiency # comment out to let JVM decide the size JVM_OPTS= "$JVM_OPTS -XX:G1HeapRegionSize=32"
          Hide
          atobey@datastax.com Albert P Tobey added a comment -

          Did you run into evacuation failures? How big was your heap? I haven't seen any evac failures with 2.1 and Java 8. This is one of the things that was worked on for Hotspot 1.8. Then again maybe it's Solr that needs the help.

          I suspect you can remove a lot of these settings on Java 8, but have also discovered that setting the GC threads is necessary on many machines.

          Try adding the below line for a nice decrease in p99 latencies.

          JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"

          Show
          atobey@datastax.com Albert P Tobey added a comment - Did you run into evacuation failures? How big was your heap? I haven't seen any evac failures with 2.1 and Java 8. This is one of the things that was worked on for Hotspot 1.8. Then again maybe it's Solr that needs the help. I suspect you can remove a lot of these settings on Java 8, but have also discovered that setting the GC threads is necessary on many machines. Try adding the below line for a nice decrease in p99 latencies. JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
          Hide
          jbellis Jonathan Ellis added a comment -

          Al, we're on board with making G1 the default for 3.0 (which incidentally requires java 8), if you can post a patch.

          Show
          jbellis Jonathan Ellis added a comment - Al, we're on board with making G1 the default for 3.0 (which incidentally requires java 8), if you can post a patch.
          Hide
          michael.perrone Michael Perrone added a comment -

          Have not seen evacuation failures, but test systems ran tight enough under heavy load to increase the reserve percentage. Heap was 20GB.

          Show
          michael.perrone Michael Perrone added a comment - Have not seen evacuation failures, but test systems ran tight enough under heavy load to increase the reserve percentage. Heap was 20GB.
          Hide
          atobey@datastax.com Albert P Tobey added a comment -

          I'll attach a patch ASAP.

          Show
          atobey@datastax.com Albert P Tobey added a comment - I'll attach a patch ASAP.
          Show
          atobey@datastax.com Albert P Tobey added a comment - https://github.com/tobert/cassandra/tree/g1gc https://github.com/tobert/cassandra/commit/33bf6719e0c8e84672c3633f8ecce602affc3071 https://github.com/tobert/cassandra/commit/cafee86c3c5798e423689a26b43d05ed9312adc5
          Hide
          jbellis Jonathan Ellis added a comment -

          Ariel Weisberg to review

          Show
          jbellis Jonathan Ellis added a comment - Ariel Weisberg to review
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment - - edited

          We also need to update conf/cassandra-env.ps1 and bin/cassandra.ps1 to keep Windows' configuration in parity with linux on this ticket; the linked branches appear to be linux env only.

          edit: probably worth mentioning - we don't yet have the performance testing infrastructure to compare old v. new on Windows w/regards to gc types on the JVM and interactions w/MemoryManager in the kernel so this would be a "blind" change on Windows. Given the focus in the JVM on improvements to g1gc I'd prefer we keep the platforms in parity for now and test platform-specific nuances w/regards to GC once we have the infrastructure to do so.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - - edited We also need to update conf/cassandra-env.ps1 and bin/cassandra.ps1 to keep Windows' configuration in parity with linux on this ticket; the linked branches appear to be linux env only. edit: probably worth mentioning - we don't yet have the performance testing infrastructure to compare old v. new on Windows w/regards to gc types on the JVM and interactions w/MemoryManager in the kernel so this would be a "blind" change on Windows. Given the focus in the JVM on improvements to g1gc I'd prefer we keep the platforms in parity for now and test platform-specific nuances w/regards to GC once we have the infrastructure to do so.
          Hide
          atobey@datastax.com Albert P Tobey added a comment -

          Yeah. I started on the Powershell scripts but figured I should talk to someone more knowledgeable on Windows before making the change.

          If you want a straight port I can throw that together and do a quick test on my local Windows machine.

          Show
          atobey@datastax.com Albert P Tobey added a comment - Yeah. I started on the Powershell scripts but figured I should talk to someone more knowledgeable on Windows before making the change. If you want a straight port I can throw that together and do a quick test on my local Windows machine.
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          If you can throw together and link a branch with a straight port I'll do some quick sanity testing on it both locally and on the CI server.

          For changing JVM params it should be really straightforward.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - If you can throw together and link a branch with a straight port I'll do some quick sanity testing on it both locally and on the CI server. For changing JVM params it should be really straightforward.
          Show
          atobey@datastax.com Albert P Tobey added a comment - https://github.com/tobert/cassandra/commit/0759be3b2a2a8ded0098622dcb95c0eb47d79fd3
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          Sanity check on local laptop on a variety of workloads looks comparable between CMS and G1, slight edge to CMS but I'm on a sub 8G heap so that's to be expected.

          Couple of spelling nits in comments in conf:
          misspelled "effecitve"
          vice versa, not visa-versa

          Haven't tested yet on CI as I have a dtest going I don't want to mess with but I'm comfortable moving forward w/the results from testing locally.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - Sanity check on local laptop on a variety of workloads looks comparable between CMS and G1, slight edge to CMS but I'm on a sub 8G heap so that's to be expected. Couple of spelling nits in comments in conf: misspelled "effecitve" vice versa, not visa-versa Haven't tested yet on CI as I have a dtest going I don't want to mess with but I'm comfortable moving forward w/the results from testing locally.
          Show
          atobey@datastax.com Albert P Tobey added a comment - Updated patches with spelling and whitespace fixes: https://github.com/tobert/cassandra/commits/g1gc-2 https://github.com/tobert/cassandra/commit/419d39814985a6ef165fdbafee5f1b84bf2f197b https://github.com/tobert/cassandra/commit/89d40af978eaeb02185726a63257d979111ad317 https://github.com/tobert/cassandra/commit/0f70469985d62aeadc20b41dc9cdc9d72a035c64
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          My point is, we don't really fully understand G1, and unless we undertake a research project to fully understand its pathological cases, and how they compare/contrast to its history, I'd prefer we ensured it behaved under complex loads, and not just under isolated read or write loading.

          Al, we're on board with making G1 the default for 3.0 (which incidentally requires java 8), if you can post a patch.

          Benedict and Jonathan Ellis: Reconcile these two statements for me as there don't appear to be updates on this ticket that bridge them. Were there discussions on IRC / offline or do you still have outstanding concerns about pathological cases w/this collector Benedict?

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - My point is, we don't really fully understand G1, and unless we undertake a research project to fully understand its pathological cases, and how they compare/contrast to its history, I'd prefer we ensured it behaved under complex loads, and not just under isolated read or write loading. Al, we're on board with making G1 the default for 3.0 (which incidentally requires java 8), if you can post a patch. Benedict and Jonathan Ellis : Reconcile these two statements for me as there don't appear to be updates on this ticket that bridge them. Were there discussions on IRC / offline or do you still have outstanding concerns about pathological cases w/this collector Benedict?
          Hide
          jbellis Jonathan Ellis added a comment -

          Yes, irc discussion. Basically options are to try to exhaustively test the hell out of it in the lab or to throw it over the wall and let users test it.

          Since we don't have the resources for the former, and the latter is pretty low risk (you can always copy back in the 2.2 defaults yourself), and we have pretty solid evidence from early adopters that it's an improvement in the real world, we are going with the latter rather than continuing to sit on it.

          Show
          jbellis Jonathan Ellis added a comment - Yes, irc discussion. Basically options are to try to exhaustively test the hell out of it in the lab or to throw it over the wall and let users test it. Since we don't have the resources for the former, and the latter is pretty low risk (you can always copy back in the 2.2 defaults yourself), and we have pretty solid evidence from early adopters that it's an improvement in the real world, we are going with the latter rather than continuing to sit on it.
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          Seems reasonable to me and glad to have that documented in the ticket (hint hint).

          Last question I have before commit is for Albert P Tobey (though anyone's welcome to chime in): Why did we settle on 500ms for our MaxGCPauseMillis? Was that also part of an offline conversation, or are we just rolling with the #'s Michael used above? Do we have any test data to show that 500 is a solid default for our use case rather than the pre-packaged 200 for g1 or the 2000 you'd used above for some testing?

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - Seems reasonable to me and glad to have that documented in the ticket (hint hint). Last question I have before commit is for Albert P Tobey (though anyone's welcome to chime in): Why did we settle on 500ms for our MaxGCPauseMillis? Was that also part of an offline conversation, or are we just rolling with the #'s Michael used above? Do we have any test data to show that 500 is a solid default for our use case rather than the pre-packaged 200 for g1 or the 2000 you'd used above for some testing?
          Hide
          atobey@datastax.com Albert P Tobey added a comment -

          I tested a number of different pause targets on a wide variety of machines. While the 200ms default is often fine on big machines with real CPUs, in Ghz-constrained environments like EC2 PVM or LV Xeons, throughput dropped considerably so that the GC could hit the pause target. I initially tested at 1000ms and 2000ms but settled on 500ms because it provides most of the benefit of a more generous pause target while being far enough below the current read/write timeouts in cassandra.yaml to make sure that pauses never/rarely hit those limits.

          Show
          atobey@datastax.com Albert P Tobey added a comment - I tested a number of different pause targets on a wide variety of machines. While the 200ms default is often fine on big machines with real CPUs, in Ghz-constrained environments like EC2 PVM or LV Xeons, throughput dropped considerably so that the GC could hit the pause target. I initially tested at 1000ms and 2000ms but settled on 500ms because it provides most of the benefit of a more generous pause target while being far enough below the current read/write timeouts in cassandra.yaml to make sure that pauses never/rarely hit those limits.
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          Sounds good and is easily configurable. Committed to trunk - thanks for the thorough work Al!

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - Sounds good and is easily configurable. Committed to trunk - thanks for the thorough work Al!
          Hide
          Mortinke Sebastian Martinka added a comment -

          For future tests you could use logstash and kibana to generate graphs from G1 and CMS logs: https://github.com/Mortinke/logstash-pattern

          Show
          Mortinke Sebastian Martinka added a comment - For future tests you could use logstash and kibana to generate graphs from G1 and CMS logs: https://github.com/Mortinke/logstash-pattern
          Hide
          benedict Benedict added a comment -

          We should revert this patch until we can do further analysis. We have significantly worse throughput and latency figures in 3.0, and much of this can be explained by the new GC settings. This is with more realistic workloads than we have previously benchmarked, and also with the current state of 3.0.

          These graphs paint a bleak picture. Throughput takes a 33% hit for point queries (Interestingly, there is a reduction in GC work done, presumably indicating that card marking / GC store barriers are to blame). Latency is much worse, and much less consistent, as is throughput, for all runs.

          We may be able to use this information to counteract the problem, but releasing 3.0 GA with this change as stands seems premature.

          Show
          benedict Benedict added a comment - We should revert this patch until we can do further analysis. We have significantly worse throughput and latency figures in 3.0, and much of this can be explained by the new GC settings. This is with more realistic workloads than we have previously benchmarked, and also with the current state of 3.0. These graphs paint a bleak picture. Throughput takes a 33% hit for point queries (Interestingly, there is a reduction in GC work done, presumably indicating that card marking / GC store barriers are to blame). Latency is much worse, and much less consistent, as is throughput, for all runs. We may be able to use this information to counteract the problem, but releasing 3.0 GA with this change as stands seems premature.
          Hide
          atobey@datastax.com Albert P Tobey added a comment -

          Is the picture equally bleak at RF=3?

          Do the "2.2 GC" settings include anything other than the defaults from cassandra-env.sh? "ps -efw" output is sufficient.

          I'd be happy to take a look at the GC logs if they are available.

          Show
          atobey@datastax.com Albert P Tobey added a comment - Is the picture equally bleak at RF=3? Do the "2.2 GC" settings include anything other than the defaults from cassandra-env.sh? "ps -efw" output is sufficient. I'd be happy to take a look at the GC logs if they are available.
          Hide
          jshook Jonathan Shook added a comment - - edited

          This seems pretty open-and-shut where I would expect a bit more of a nuanced test. We've honestly seen G1 be the operative improvement in some cases in the field. I'd much prefer to see "needs more analysis" than to see it resolved as fixed. CMS will not scale with hardware as we go forward. This is not in debate.

          Ah, nevermind. I see that is what the status is now.

          Show
          jshook Jonathan Shook added a comment - - edited This seems pretty open-and-shut where I would expect a bit more of a nuanced test. We've honestly seen G1 be the operative improvement in some cases in the field. I'd much prefer to see "needs more analysis" than to see it resolved as fixed. CMS will not scale with hardware as we go forward. This is not in debate. Ah, nevermind. I see that is what the status is now.
          Hide
          benedict Benedict added a comment -

          Is the picture equally bleak at RF=3?

          Regrettably so

          Do the "2.2 GC" settings include anything other than the defaults from cassandra-env.sh? "ps -efw" output is sufficient.

          I haven't double checked, I simply copied T Jake Luciani's branch and rebased to latest 3.0. It looks like it's just 2.2 defaults.

          I'd be happy to take a look at the GC logs if they are available.

          The thing is, as I say, the GC burden is pretty consistently lower. However the application performance is also worse. Indicating the problem isn't the collections, but the VM behavioural changes required to enable G1GC. So analyzing GC logs is unlikely to deliver much, and figuring out how to modify the application to reduce the burden here is unlikely to be a short task (if achievable).

          This is not in debate.

          I'm afraid nothing is not in debate in this world

          If you mean to say "CMS will not scale with increasingly gigantic heap sizes" then we would probably be in agreement, however with smallish heaps CMS works just fine - better, even. If the mid-to-long term goal of Cassandra is to have a constant heap burden, i.e. decouple heap requirements from dataset, then it doesn't follow that increasing hardware capabilities requires G1GC. There are lots of reasons why this should be our goal, and my understanding is there is a general consensus on that, but that's a separate debate.

          Certainly we need to do more research, but I will prognosticate briefly: I suspect we will find that with very large heaps (16Gb+) and with lots of headroom G1GC begins to outperform CMS, especially wrt the most critical of metrics, 99.9%ile. However I suspect we will find CMS continues to dominate in domains where it can maintain sufficiently low pause times.

          Since many users target the more modest heap sizes, we may find that it makes most sense to provide two default configurations, and have the user opt into our "default" G1GC settings if they intend to run with a very large heap. If, after extensive research, we find that we can confidently predict configs where it makes more sense, we should consider doing this automatically in cassandra-env.

          My suspicion is we won't manage to do this research in time for GA, but that doesn't stop us providing the parallel defaults and documentation to make it easy for users to enable it.

          Show
          benedict Benedict added a comment - Is the picture equally bleak at RF=3? Regrettably so Do the "2.2 GC" settings include anything other than the defaults from cassandra-env.sh? "ps -efw" output is sufficient. I haven't double checked, I simply copied T Jake Luciani 's branch and rebased to latest 3.0. It looks like it's just 2.2 defaults. I'd be happy to take a look at the GC logs if they are available. The thing is, as I say, the GC burden is pretty consistently lower. However the application performance is also worse. Indicating the problem isn't the collections, but the VM behavioural changes required to enable G1GC. So analyzing GC logs is unlikely to deliver much, and figuring out how to modify the application to reduce the burden here is unlikely to be a short task (if achievable). This is not in debate. I'm afraid nothing is not in debate in this world If you mean to say "CMS will not scale with increasingly gigantic heap sizes " then we would probably be in agreement, however with smallish heaps CMS works just fine - better, even. If the mid-to-long term goal of Cassandra is to have a constant heap burden, i.e. decouple heap requirements from dataset, then it doesn't follow that increasing hardware capabilities requires G1GC. There are lots of reasons why this should be our goal, and my understanding is there is a general consensus on that, but that's a separate debate. Certainly we need to do more research, but I will prognosticate briefly: I suspect we will find that with very large heaps (16Gb+) and with lots of headroom G1GC begins to outperform CMS, especially wrt the most critical of metrics, 99.9%ile. However I suspect we will find CMS continues to dominate in domains where it can maintain sufficiently low pause times. Since many users target the more modest heap sizes, we may find that it makes most sense to provide two default configurations, and have the user opt into our "default" G1GC settings if they intend to run with a very large heap. If, after extensive research, we find that we can confidently predict configs where it makes more sense, we should consider doing this automatically in cassandra-env. My suspicion is we won't manage to do this research in time for GA, but that doesn't stop us providing the parallel defaults and documentation to make it easy for users to enable it.
          Hide
          jshook Jonathan Shook added a comment - - edited

          I do believe that there is a gap between the maximum effective CMS heap sizes and the minimum effective G1 sizes. I'd estimate it to be about the 14GB - 24GB range. Neither does admirably when taxed for GC throughput in that range. Put in another way, I've never and would never advocate that someone use G1 with less than 24G of heap. In practice, I use it only on systems with 64GB of memory, where it is no big deal to give G1 32GB to work with. We have simply seen G1 go slower when it doesn't have adequate scratch space. In essence, it really likes to have more memory.

          We have also seen anecdotal evidence that G1 seems to settle in, performance wise, after a warm-up time. It could be that it needs to collect metrics long enough under steady state before it learns how to handle GC and heap allocation better. This hasn't been proven out definitively, but is strongly evidenced in some longer-run workload studies.

          I do agree that when you don't really need more than 12GB of heap, CMS will be difficult to beat with the appropriate tunings. I'm not really sure what to do about the mid-band where neither CMS nor G1 are very happy. We may have to be prescriptive in the sense that if you want to use G1, then you should give it enough to work with effectively.

          Perhaps we need to make the startup scripts source a different GC config file depending on the detected memory in the system. I normally configure G1 as a sourced (included) file to the -env.sh script, so this would be fairly straightforward.

          Albert P Tobey, any comments on this?

          Show
          jshook Jonathan Shook added a comment - - edited I do believe that there is a gap between the maximum effective CMS heap sizes and the minimum effective G1 sizes. I'd estimate it to be about the 14GB - 24GB range. Neither does admirably when taxed for GC throughput in that range. Put in another way, I've never and would never advocate that someone use G1 with less than 24G of heap. In practice, I use it only on systems with 64GB of memory, where it is no big deal to give G1 32GB to work with. We have simply seen G1 go slower when it doesn't have adequate scratch space. In essence, it really likes to have more memory. We have also seen anecdotal evidence that G1 seems to settle in, performance wise, after a warm-up time. It could be that it needs to collect metrics long enough under steady state before it learns how to handle GC and heap allocation better. This hasn't been proven out definitively, but is strongly evidenced in some longer-run workload studies. I do agree that when you don't really need more than 12GB of heap, CMS will be difficult to beat with the appropriate tunings. I'm not really sure what to do about the mid-band where neither CMS nor G1 are very happy. We may have to be prescriptive in the sense that if you want to use G1, then you should give it enough to work with effectively. Perhaps we need to make the startup scripts source a different GC config file depending on the detected memory in the system. I normally configure G1 as a sourced (included) file to the -env.sh script, so this would be fairly straightforward. Albert P Tobey , any comments on this?
          Hide
          atobey@datastax.com Albert P Tobey added a comment -

          The main point of switching to G1 was to enable most users to get decent - if not the best - performance out of the box without having to guess HEAP_NEWSIZE.

          Since nobody has the time or inclination to test/discover further, it might as well be rolled back. Users won't notice any difference in pain since there was never a release with G1.

          Show
          atobey@datastax.com Albert P Tobey added a comment - The main point of switching to G1 was to enable most users to get decent - if not the best - performance out of the box without having to guess HEAP_NEWSIZE. Since nobody has the time or inclination to test/discover further, it might as well be rolled back. Users won't notice any difference in pain since there was never a release with G1.
          Hide
          benedict Benedict added a comment -

          Sounds like we're broadly in agreement then.

          Show
          benedict Benedict added a comment - Sounds like we're broadly in agreement then.
          Hide
          xedin Pavel Yaskevich added a comment -

          It would be interesting to know what happens in the longer running test, more than 10 minutes let's say, if there is a way to run it...

          Show
          xedin Pavel Yaskevich added a comment - It would be interesting to know what happens in the longer running test, more than 10 minutes let's say, if there is a way to run it...
          Hide
          jshook Jonathan Shook added a comment -

          I'd argue that there already is an increase in pain as you try to use more of the metal on a node. We've just become acclimated to it. Instead of scaling the compute side over the metal, we do silly things like run multiple instances per box. Its not really silly if it gets results, but it is an example of where we do something tactically, get so used to it as a necessary complexity, and then just keep taking for granted that this is how we do it. I personally don't want to keep going down this path. So, I am inclined to carry on with the testing and characterization, in time. We should compare notes and methods and see what can be done to reduce the overall effort.

          Show
          jshook Jonathan Shook added a comment - I'd argue that there already is an increase in pain as you try to use more of the metal on a node. We've just become acclimated to it. Instead of scaling the compute side over the metal, we do silly things like run multiple instances per box. Its not really silly if it gets results, but it is an example of where we do something tactically, get so used to it as a necessary complexity, and then just keep taking for granted that this is how we do it. I personally don't want to keep going down this path. So, I am inclined to carry on with the testing and characterization, in time. We should compare notes and methods and see what can be done to reduce the overall effort.
          Hide
          yangzhe1991 Phil Yang added a comment - - edited

          It seems that the max heap is still 8G in trunk so maybe it is unfair for G1. What is the heap size of these tests?

          Show
          yangzhe1991 Phil Yang added a comment - - edited It seems that the max heap is still 8G in trunk so maybe it is unfair for G1. What is the heap size of these tests?
          Hide
          benedict Benedict added a comment - - edited

          It would be interesting to know what happens in the longer running test,

          here is one 10x larger
          and another 100x larger is still running

          Interestingly, the completed 10x larger is a more nuanced picture, but this is because CMS got slower for one of the runs, not G1GC faster - but it never got up to its prior speed - this is weird, since 2.2 did manage to get up to its faster speed still, so there may be something funny going on on that particular test.

          It seems that the max heap is still 8G in trunk so maybe it is unfair for G1. What is the heap size of these tests?

          I'm not sure I would call that unfair. These boxes have no more than 64Gb of RAM, I'm pretty sure (perhaps only 32Gb), and so for anything but the most write-heavy workloads an 8Gb heap is probably what you'll want to ensure the file cache can make meaningful contributions to performance. Certainly we can and should test more scenarios, but this is a pretty typical setup for a pretty typical piece of hardware. We should consider giving users profiles for different workloads, though, and consider raising the max heap much higher for write-heavy workloads, or boxes with gigantic banks of RAM, and use G1 accordingly (as driven by our research)

          Show
          benedict Benedict added a comment - - edited It would be interesting to know what happens in the longer running test, here is one 10x larger and another 100x larger is still running Interestingly, the completed 10x larger is a more nuanced picture, but this is because CMS got slower for one of the runs, not G1GC faster - but it never got up to its prior speed - this is weird, since 2.2 did manage to get up to its faster speed still, so there may be something funny going on on that particular test. It seems that the max heap is still 8G in trunk so maybe it is unfair for G1. What is the heap size of these tests? I'm not sure I would call that unfair. These boxes have no more than 64Gb of RAM, I'm pretty sure (perhaps only 32Gb), and so for anything but the most write-heavy workloads an 8Gb heap is probably what you'll want to ensure the file cache can make meaningful contributions to performance. Certainly we can and should test more scenarios, but this is a pretty typical setup for a pretty typical piece of hardware. We should consider giving users profiles for different workloads, though, and consider raising the max heap much higher for write-heavy workloads, or boxes with gigantic banks of RAM, and use G1 accordingly (as driven by our research)
          Hide
          xedin Pavel Yaskevich added a comment -

          Thanks, Benedict! This is in line with what I would expect from longer run of CMS vs. G1 since CMS is known to degrade over time and G1 actually needs some time to collect statistics and properly arrange relocations etc., so let's wait with revert and see what would 100x larger run produce.

          Show
          xedin Pavel Yaskevich added a comment - Thanks, Benedict! This is in line with what I would expect from longer run of CMS vs. G1 since CMS is known to degrade over time and G1 actually needs some time to collect statistics and properly arrange relocations etc., so let's wait with revert and see what would 100x larger run produce.
          Hide
          benedict Benedict added a comment - - edited

          Regrettably, that run crashed. Will have to diagnose the logs to see what may have happened - Ryan McGuire, could you unstick the cstar job, and collect the log files?

          (The weird thing about CMS on that particular run is that 2.2 does not degrade; it is still 40% faster)

          Show
          benedict Benedict added a comment - - edited Regrettably, that run crashed. Will have to diagnose the logs to see what may have happened - Ryan McGuire , could you unstick the cstar job, and collect the log files? (The weird thing about CMS on that particular run is that 2.2 does not degrade; it is still 40% faster)
          Show
          enigmacurry Ryan McGuire added a comment - - edited Benedict Here's the logs: http://scp.datastax.com/~ryan/cstar_perf/01b714d8_logs.tar.bz2
          Hide
          jeromatron Jeremy Hanna added a comment -

          I was kind of curious about whether 2.1 or 2.2 branches exhibited the same behavior with the same gc settings as other data points.

          Show
          jeromatron Jeremy Hanna added a comment - I was kind of curious about whether 2.1 or 2.2 branches exhibited the same behavior with the same gc settings as other data points.
          Hide
          benedict Benedict added a comment -

          Ryan McGuire: could you upgrade stress to run from this branch?

          Could you also ensure it's running with a largeish heap (at least a couple of Gb)? I'll file tickets to update the mainline source tree on both these counts. We should start enabling stress gc logging in cstar at some point as well, so we can diagnose issues in the run, and see if they're attributable to stress itself.

          Show
          benedict Benedict added a comment - Ryan McGuire : could you upgrade stress to run from this branch ? Could you also ensure it's running with a largeish heap (at least a couple of Gb)? I'll file tickets to update the mainline source tree on both these counts. We should start enabling stress gc logging in cstar at some point as well, so we can diagnose issues in the run, and see if they're attributable to stress itself.
          Hide
          enigmacurry Ryan McGuire added a comment -

          Benedict I modified the schedule GUI to allow you to change the version of stress per operation. Just change the default 'apache/trunk' to 'enigmacurry/stress-report-interval' where I took your branch and applied a 4G stress heap. If you want to tweak that, you can put your own branch name in instead. The GC logs have been logged for awhile now, they're wrapped up in the same tarball as the other logs that you can download.

          See this example for how to specify your test: http://cstar.datastax.com/schedule?clone=0c2efd50-60d5-11e5-b6a8-42010af0688f

          Show
          enigmacurry Ryan McGuire added a comment - Benedict I modified the schedule GUI to allow you to change the version of stress per operation. Just change the default 'apache/trunk' to 'enigmacurry/stress-report-interval' where I took your branch and applied a 4G stress heap. If you want to tweak that, you can put your own branch name in instead. The GC logs have been logged for awhile now, they're wrapped up in the same tarball as the other logs that you can download. See this example for how to specify your test: http://cstar.datastax.com/schedule?clone=0c2efd50-60d5-11e5-b6a8-42010af0688f
          Hide
          benedict Benedict added a comment -

          Get this running on blade_11b...

          raceback (most recent call last):
            File "/home/ryan/git/cstar_perf/env/bin/cstar_perf_stress", line 9, in <module>
              load_entry_point('cstar-perf.tool==1.0', 'console_scripts', 'cstar_perf_stress')()
            File "/home/ryan/git/cstar_perf/tool/cstar_perf/tool/stress_compare.py", line 258, in main
              stress_compare(**cfg)
            File "/home/ryan/git/cstar_perf/tool/cstar_perf/tool/stress_compare.py", line 111, in stress_compare
              stress_shas = setup_stress(stress_revisions)
            File "/home/ryan/git/cstar_perf/tool/cstar_perf/tool/benchmark.py", line 406, in setup_stress
              revisions.update(build_stress(stress_revision))
            File "/home/ryan/git/cstar_perf/tool/cstar_perf/tool/benchmark.py", line 382, in build_stress
              raise AssertionError('Invalid stress_revision: {}'.format(stress_revision))
          AssertionError: Invalid stress_revision: enigmacurry/stress-report-interval
          

          Is the new code only deployed to blade_11, perhaps?

          Show
          benedict Benedict added a comment - Get this running on blade_11b... raceback (most recent call last): File "/home/ryan/git/cstar_perf/env/bin/cstar_perf_stress" , line 9, in <module> load_entry_point('cstar-perf.tool==1.0', 'console_scripts', 'cstar_perf_stress')() File "/home/ryan/git/cstar_perf/tool/cstar_perf/tool/stress_compare.py" , line 258, in main stress_compare(**cfg) File "/home/ryan/git/cstar_perf/tool/cstar_perf/tool/stress_compare.py" , line 111, in stress_compare stress_shas = setup_stress(stress_revisions) File "/home/ryan/git/cstar_perf/tool/cstar_perf/tool/benchmark.py" , line 406, in setup_stress revisions.update(build_stress(stress_revision)) File "/home/ryan/git/cstar_perf/tool/cstar_perf/tool/benchmark.py" , line 382, in build_stress raise AssertionError('Invalid stress_revision: {}'.format(stress_revision)) AssertionError: Invalid stress_revision: enigmacurry/stress-report-interval Is the new code only deployed to blade_11, perhaps?
          Hide
          enigmacurry Ryan McGuire added a comment -

          The code was deployed, but the git fetch didn't work. We'll fix that. I manually fetched from my repo, so it'll work with that stress revision for now.

          Show
          enigmacurry Ryan McGuire added a comment - The code was deployed, but the git fetch didn't work. We'll fix that. I manually fetched from my repo, so it'll work with that stress revision for now.
          Hide
          JoshuaMcKenzie Joshua McKenzie added a comment -

          Opened CASSANDRA-10403 to cover profiling and possible revert to CMS.

          Show
          JoshuaMcKenzie Joshua McKenzie added a comment - Opened CASSANDRA-10403 to cover profiling and possible revert to CMS.

            People

            • Assignee:
              Unassigned
              Reporter:
              jbellis Jonathan Ellis
              Reviewer:
              Joshua McKenzie
            • Votes:
              0 Vote for this issue
              Watchers:
              48 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development