The main thing that confuses me about what I see is the separation between TwoPhase & TwoPhaseApproximation despite the comments. Couldn't TwoPhase.verify return true, and getApproximation return ‘this’?
It is confusing. TwoPhase does two-phase intersection, it works on approximations, but it is an "exact" scorer, e.g. its what is used if you AND(term, phrase).
However, its possible you could have nested conjunctions such as AND(term1, AND(term2, phrase)). So ConjunctionScorer itself, supports approximations when any of its subs do. TwoPhaseApproximation is this impl, which defers matches() to the caller.
This way confirmation is deferred until there is "global docid" agreement across the whole query tree. With this patch its only going to work with nested conjunctions, because thats all i implemented it for. Obviously for it to work across the board (means put geo/phrase queries anywhere in query/filter tree at arbitrary places and everything "works"), disjunctions and other boolean-like scorers must implement the API too when their subs support approximations.
BTW, please don't generalize all geo as being slow; there are multiple strategies with performance trade-offs for implementing geo.
It was not a stab at geo or anything. phrases are in the same category. It is just another use case where verifying the document is actually a match, is more costly then moving to the next "possible" document for the purpose of zig-zag intersection. Exactphrasescorer is a tricky case since its not TOO terribly expensive to verify a match, but still should be a win. thats why i tried to prototyped with it first.