Is the picture equally bleak at RF=3?
Do the "2.2 GC" settings include anything other than the defaults from cassandra-env.sh? "ps -efw" output is sufficient.
I haven't double checked, I simply copied T Jake Luciani's branch and rebased to latest 3.0. It looks like it's just 2.2 defaults.
I'd be happy to take a look at the GC logs if they are available.
The thing is, as I say, the GC burden is pretty consistently lower. However the application performance is also worse. Indicating the problem isn't the collections, but the VM behavioural changes required to enable G1GC. So analyzing GC logs is unlikely to deliver much, and figuring out how to modify the application to reduce the burden here is unlikely to be a short task (if achievable).
This is not in debate.
I'm afraid nothing is not in debate in this world
If you mean to say "CMS will not scale with increasingly gigantic heap sizes" then we would probably be in agreement, however with smallish heaps CMS works just fine - better, even. If the mid-to-long term goal of Cassandra is to have a constant heap burden, i.e. decouple heap requirements from dataset, then it doesn't follow that increasing hardware capabilities requires G1GC. There are lots of reasons why this should be our goal, and my understanding is there is a general consensus on that, but that's a separate debate.
Certainly we need to do more research, but I will prognosticate briefly: I suspect we will find that with very large heaps (16Gb+) and with lots of headroom G1GC begins to outperform CMS, especially wrt the most critical of metrics, 99.9%ile. However I suspect we will find CMS continues to dominate in domains where it can maintain sufficiently low pause times.
Since many users target the more modest heap sizes, we may find that it makes most sense to provide two default configurations, and have the user opt into our "default" G1GC settings if they intend to run with a very large heap. If, after extensive research, we find that we can confidently predict configs where it makes more sense, we should consider doing this automatically in cassandra-env.
My suspicion is we won't manage to do this research in time for GA, but that doesn't stop us providing the parallel defaults and documentation to make it easy for users to enable it.