I think we should keep the issue open, I know I've been thinking about this one a lot lately.
The thing I see is, for it to work really nicely, BooleanQuery really needs to own execution of both queries and filters.
some kind of blurry proposal/plan like this:
Execute filters by BooleanQuery instead of its mini-me (FilteredQuery), e.g. as an additional type of BooleanClause.
Merge Filter and Weight, in some way that makes sense, e.g. maybe just make Weight.scorer(LeafReaderContext context, Bits acceptDocs) a covariant-return override of Filter.getDocIdSet(LeafReaderContext context, Bits acceptDocs). Make sure any "wrappers" like ConstantScore delegate any new APIs correctly.
Add bulk methods like and/or/not to Filter such that optimized impls like FixedBitSet.and() can be used. Since java 7u40 these ones get autovectorized by hotspot and are a valid strategy. I think maybe some of these could be optimized by sparse bitset impls as well.
Create an enhanced cost metric/execution API for filters. BooleanQuery needs this additional context to give the most efficient execution. At the least, it should have the information to know to do the bulk optos above, and even apply deletes this way if its appropriate (in lucene 5 deleted docs are a FixedBitSet). I would also want a way to indicate that a Filter has a linear-time nextDoc(). these cases (e.g. filtering by exact geographic distance) are horrible to support, but handling them correctly (e.g. in a final phase) is a lesser evil than having the API be crazy so that systems like solr/es can do them with hacks.
Remove stuff like FilteredQuery, BooleanFilter, etc.
Fix LUCENE-3331 (or impl in some other way), such that "scores are not needed" is passed down the query execution stack. The tricky part is BQ's "execution plan" is currently in two places really, rewrite() and Weight.scorer(). And I really think it needs the freedom to be able to completely restructure queries for performance (across nested BQ as well). Another option is to setup internal infra so BooleanWeight.scorer() can do this, as it have cost() knowledge too, but it feels so wrong.
Finally, we should add some support for "two-phase execution" via DISI.getSuperSet() or some other approximation. ConjunctionScorer could both use (when at least one sub supports) and implement this method (when e.g. coord scoring prevents optimal restructing and its nested) for faster AND/filtering of phrase/sloppy/spans/whatever, or for any other custom query/filter that supports a fast approximation.