I suppose it would help "big" distance queries more and maybe hurt "tiny" distance queries, since it does the up front work

I think the only scenario that gets worse is when the distance is so tiny that the distance range is always contained in a single BKD cell. As soon as you start having crossing cells, that cost is quickly amortized. For instance, say your index has 30 segments with one crossing cell each (which is still a best-case scenario), we already need to perform 30*1024~=30k distance computations. On the other hand, this change needs to do 4096*4~=16k up-front distance computations (regardless of the number of segments since it is computed for a whole query) so if it allows to save 1/2 distance computations, its cost is already amortized.

the same up front work is done twice, and one of them won't be used

True, this should be easy to fix!

Since you use bit shifting, it looks like the number of effective cells may be anywhere between 1024 and 4096 right? Do you think two straight integer divisions instead, which could get us usually to 4096 cells, is too costly per hit?

You are right about the fact that there are lost cells. Avoiding integer divisions was one reason in favor of bit shifting, but there was another one, which is that they do not create boxes that cross the dateline.

That said, you make a good point that we should not have to both store and compute relations for those lost cells, let me look into fixing that.

Here is a patch. I did not want the doc-values distance query to perform distance computations all the time so I ended up working on a way to split the bounding box around the distance circle into 4096 smaller boxes, and pre-computing how they relate to the distance query (I reused the logic that we already had in LatLonPointDistanceQuery for the inner nodes). I also switched LatLonPointDistanceQuery to it (for the leaf nodes that return Relation.CELL_CROSSES_QUERY), and interestingly it yielded a 8% speedup when tested with

IndexAndSearchOpenStreetMaps.I would appreciate if someone who is more comfortable with geo could review as maybe the way

LatLonPointDistanceQuerycomputes relations between a box and a circle relies on assumptions that are not met in this new code?