Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7050

Improve the query cache heuristic to detect costly queries

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Term queries, phrase queries and their combinations through boolean queries should not be cached too agressively since they can efficiently make use of skip lists. However we also have a number of queries that in practice need to visit all matches anyway like PrefixQuery, TermsQuery, PointInSetQuery, PointRangeQuery, so caching them more agressively can help avoid computing all documents that match in the whole index again and again.

        Activity

        Hide
        jpountz Adrien Grand added a comment -

        One problem is that some of these queries, like TermsQuery and PointInPolygonQuery are in different modules (queries and sandbox respectively) so we can't refer to them from the caching policy. We could add a new API but it does't feel right to me to add one only for the caching use-case. So maybe we could rely on the class name for popular costly queries that are in other modules? Here is a patch that demonstrates the idea. It is a bit hacky but maybe that's not too bad since the hack is very contained?

        Show
        jpountz Adrien Grand added a comment - One problem is that some of these queries, like TermsQuery and PointInPolygonQuery are in different modules (queries and sandbox respectively) so we can't refer to them from the caching policy. We could add a new API but it does't feel right to me to add one only for the caching use-case. So maybe we could rely on the class name for popular costly queries that are in other modules? Here is a patch that demonstrates the idea. It is a bit hacky but maybe that's not too bad since the hack is very contained?
        Hide
        rcmuir Robert Muir added a comment -

        On the idea of a method or interface, are we sure caching is the only use case? What about eg reordering clauses for more efficiency and other potential uses?

        Im not opposed to the hacky solution since its just a default impl... But if we can come up with a very nice name... Then i think it would be cleaner for queries to confess that processing just a few docs is just as costly as processing all docs.

        Show
        rcmuir Robert Muir added a comment - On the idea of a method or interface, are we sure caching is the only use case? What about eg reordering clauses for more efficiency and other potential uses? Im not opposed to the hacky solution since its just a default impl... But if we can come up with a very nice name... Then i think it would be cleaner for queries to confess that processing just a few docs is just as costly as processing all docs.
        Hide
        jpountz Adrien Grand added a comment -

        Thanks for the feedback. Then I suggest to push this change and separately explore whether such a flag on queries (or maybe weights) could be used to better execute queries.

        Show
        jpountz Adrien Grand added a comment - Thanks for the feedback. Then I suggest to push this change and separately explore whether such a flag on queries (or maybe weights) could be used to better execute queries.
        Hide
        jpountz Adrien Grand added a comment -

        What about eg reordering clauses for more efficiency and other potential uses? [...] it would be cleaner for queries to confess that processing just a few docs is just as costly as processing all docs.

        I opened LUCENE-7055.

        Show
        jpountz Adrien Grand added a comment - What about eg reordering clauses for more efficiency and other potential uses? [...] it would be cleaner for queries to confess that processing just a few docs is just as costly as processing all docs. I opened LUCENE-7055 .
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 44324d3dfe34fb436595f8c15bfc97eb39564b1f in lucene-solr's branch refs/heads/master from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=44324d3 ]

        LUCENE-7050: Cache TermsQuery and point queries more aggressively.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 44324d3dfe34fb436595f8c15bfc97eb39564b1f in lucene-solr's branch refs/heads/master from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=44324d3 ] LUCENE-7050 : Cache TermsQuery and point queries more aggressively.

          People

          • Assignee:
            jpountz Adrien Grand
            Reporter:
            jpountz Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development