Are you suggesting that this new builder should be backported to 5.x and that the setters on BooleanQuery should be marked as deprecated?
I did not consider this option because it would make BooleanQuery have the API of an immutable query while nothing would guarantee that it cannot be modified since we would need to keep the old deprecated setters. So I just made it a breaking change for 6.0.
I think it would be good to get the API out there for 5.x users, so they can start using it. Even if the underlying BooleanQuery objects themselves are not immutable, encouraging people to use the Builder pattern to create them, and discouraging them from expecting to be able to mutate the BooleanQuery objects seems like a good idea to get out there as early as possible.
I am willing to document the fact that we clone sub queries, but I am on the fence about removing it, since without defensively cloning, the BooleanQuery would still be mutable while this issue is about making sure it cannot change.
There's a difference between saying the BooleanQuery is immutable and cannot change and saying the queries wrapped by the BooleanQuery are cloned & no longer accessible and cannot cahnge – that's something that wasn't clear from your issue description until I looked closer at your commit.
By comparison, "Collections.singletonList(T o)" is documented to return an immutable list, but it doesn't clone 'o' – it doesn't try to prevent you from calling o.changeState() after you construct that list.
For instance, consider the following sequence of operations: ...
The change in run time behavior of code like that is exactly why this change scares me – it's very similar to what users may have come to expect from code like my example if they want to do something like call REUSED_FILTER.setBost(foo) to dynamically tweak relevancy of many queries (all at once) at runtime.
One motivation behind this change is to be able to enable the query cache by default in IndexSearcher (currently off), which we can only do if queries can reliably be used as cache keys.
I can appreciate that goal, but i don't think it's ever going to be feasible to turn that on by default in the truly generic case of any arbitrary lucene application, where people might have custom Query impls.
Bottom line: i think there is a decent risk that this under the covers, no way to turn off, cloning of sub-queries will have some serious negative consequences for some use cases, but will really only help if you use a query cache and if you get a high hit rate on that cache and if your code is written weirdly/broken to call setBoost on queries after you use them – which doesn't make sense to do at all if you are using a query cache.
Consider again that REUSED_FILTER example i mentioned in my last comment – assuming the application is "well behaved", and doesn't call setBoost at add times: even w/o the implicit clone in BooleanQuery it should work great with a query cache enabled, and would use a lot less ram then with the implicit sub-query cloning in the BooleanQuery.Builder.
But if an application does start trying to keep refrences to previously constructed Query instances, and call mutating methods (like setBoost) at runtime, then really they aren't going to be able to safely use the query cache at all – regardless of whether you have this implicit clone in BooleanQuery's builder.