Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6685

GeoPointInBBox/Distance queries should have safeguards

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      These queries build a big list of term ranges, where the size of the list is in proportion to how many cells of the space filling curve are "crossed" by the perimeter of the query (I think?).

      This can easily be 100s of MBs for a big enough query ... not to mention slow to enumerate (we still do this again for each segment).

      I think the queries should have safeguards, much like we have maxDeterminizedStates for Automaton based queries, to prevent accidental OOMEs.

      But I think longer term we should either change the ranges to be enumerated on-demand and never stored in entirety (like NumericRangeTermsEnum), or change the query so it has a fixed budget of how many cells it's allowed to visit and then within a crossing cell it uses doc values to post-filter.

        Attachments

        1. LUCENE-6685.patch
          6 kB
          Nicholas Knize
        2. LUCENE-6685.patch
          6 kB
          Nicholas Knize
        3. LUCENE-6685.patch
          11 kB
          Nicholas Knize

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mikemccand Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: