I'm afraid the endpoint inclusion as done by v2 is not as efficient as could be. Consider a 5 nodes, RF=3, no DC and query at CL.ONE setup. As it happens, the first endpoint for any given range won't be in the list of endpoint for the next range. So we'll end up merging no range and doing 5 range queries, even though 2 would be enough to cover the whole range.
So to minimize the number of range queried I'm pretty sure the best option is for a given range to consider the intersection of its endpoints and the ones of the next range. I'm attaching a v3 patch that implements what I have in mind.
I note that this v3 pull the logic that compute whether a list of live endpoint can fulfill a given consistency level from ReadCallback and WriteResponseHandler into the ConsistencyLevel class. The reason is that patch needs that logic before the ReadCallback has been created. But I think this is a good refactor as this logic belong to ConsistencyLevel anyway.
This made me realise there is a complication however, which is that we probably need to take datacenters and maybe even the endpoint latency scores into account. Say a range has for replica [A, B] and the next range has replica [B, C] and CL == ONE. You could merge both range and send the request to B, but if say B is in a remote datacenter while A and C are in the local one, maybe doing 2 queries to A and C would actually be better. Same if B is local but very very slow. To try to handle that, the v3 patch move that decision to the snitch and the default implementation consider only endpoints in the localDC in the intersection of endpoints used to decided whether we can/should merge two consecutive ranges. We could then have the dymanic snitch do something special, like not consider endpoint with a very bad latency score when computing the intersection, but I haven't implemented that yet, because it's unclear to me where to draw the limit.
I've done a few quick tests with this patch. For a 5 nodes, RF=3, no DC setup, without the patch we query 5 ranges at CL.ONE and 10 at CL.QUORUM to cover the full ring (SELECT * FROM foo). With the patch, we query 2 ranges at CL.ONE and 6 at CL.QUORUM. And as expected, in the vnodes case with in a single node setup, the same SELECT * requires only 1 internal query instead of 256.