Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8058

Never cache large TermInSetQuery instances

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.2, 8.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I have seen several cases in which the query cache was highly underestimating its memory usage due to the fact that it had references to large queries that ended up using more memory than the associated doc id sets.

      We had a workaround for term-in-set queries by making TermInSetQuery implement Accountable, but this information is lost when it is wrapped in another query such as a BooleanQuery. So I would like to apply a safer fix that just disables caching on large TermInSetQuery instances.

      I know it's a pity given that large queries are probably more expensive and thus more cache-worthy, but I see such large queries as the result of a bad design or a workaround to the fact that Lucene is not the right tool for the job, so I think that disabling caching on large term-in-set queries is the right trade-off by making the query cache safer for the majority of our users.

        Attachments

        1. LUCENE-8058.patch
          8 kB
          Adrien Grand
        2. LUCENE-8058.patch
          11 kB
          Adrien Grand

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jpountz Adrien Grand
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: