Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7749

IndexOrDocValuesQuery not working with LRUQueryCache (?)

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0, 6.6, 6.5.1
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I was experimenting with 6.5.0-SNAPSHOT and could not see any performance improvement using the new IndexOrDocValuesQuery where I would have expected some.
      I am using a basic FILTER query (term + point/dv range), along with IndexSearcher#search.
      Looking at the stack trace it seems that LRUQueryCache#CachingWrapperWeight not delegating the scorerSupplier method is the reason.

      Maybe it is on purpose for the result to be cacheable ? Does that mean IndexOrDocValuesQuery is not useable with the default IndexSearcher cache ? (Or maybe am I just completely misusing the IndexOrDocValuesQuery feature ?)

      Here is a thread dump of the call to IndexOrDocValuesQuery#scorerSupplier

      at org.apache.lucene.search.IndexOrDocValuesQuery$1.scorerSupplier(IndexOrDocValuesQuery.java:148)
      at org.apache.lucene.search.IndexOrDocValuesQuery$1.scorer(IndexOrDocValuesQuery.java:168)
      at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.scorer(LRUQueryCache.java:746)
      at org.apache.lucene.search.Weight.scorerSupplier(Weight.java:126)
      at org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:400)
      at org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:381)
      at org.apache.lucene.search.Weight.bulkScorer(Weight.java:160)
      at org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:375)
      at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.cache(LRUQueryCache.java:704)
      at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.bulkScorer(LRUQueryCache.java:787)
      at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:666)
      at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:473)

        Activity

        Hide
        jpountz Adrien Grand added a comment -

        This is a bug indeed! Would you like to submit a patch? We are soon going to release 6.5.1 so I think it would be nice to have that fix in.

        Show
        jpountz Adrien Grand added a comment - This is a bug indeed! Would you like to submit a patch? We are soon going to release 6.5.1 so I think it would be nice to have that fix in.
        Hide
        jpountz Adrien Grand added a comment -

        I went ahead and worked on a patch to have it for 6.5.1 in time.

        Show
        jpountz Adrien Grand added a comment - I went ahead and worked on a patch to have it for 6.5.1 in time.
        Hide
        mikemccand Michael McCandless added a comment -

        +1

        Show
        mikemccand Michael McCandless added a comment - +1
        Hide
        marumarutan Martin Amirault added a comment - - edited

        Adrien Grand , thank you for working the on fix.
        I dont have lucene development set up here, but the patch may still need one more fix to work correctly (looking at 6.5.0 with your patch).
        Actually LRUQueryCache#CachingWrapperWeight not delegating the scorerSupplier was one of the problem, but there is another problem related to BulkScorer, if you look at the last part of the stack trace:

        ...
        at org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:381)
        at org.apache.lucene.search.Weight.bulkScorer(Weight.java:160)
        at org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:375)
        at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.cache(LRUQueryCache.java:704)
        at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.bulkScorer(LRUQueryCache.java:787)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:666)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:473)

        BooleanWeight.java:385 always get the score with randomAccess=false, which again will ignore any IndexOrDocValuesQuery improvements.
        Weight#bulkScorer should probably use the scorerSupplier method instead of the scorer method internally.

        Show
        marumarutan Martin Amirault added a comment - - edited Adrien Grand , thank you for working the on fix. I dont have lucene development set up here, but the patch may still need one more fix to work correctly (looking at 6.5.0 with your patch). Actually LRUQueryCache#CachingWrapperWeight not delegating the scorerSupplier was one of the problem, but there is another problem related to BulkScorer, if you look at the last part of the stack trace: ... at org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:381) at org.apache.lucene.search.Weight.bulkScorer(Weight.java:160) at org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:375) at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.cache(LRUQueryCache.java:704) at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.bulkScorer(LRUQueryCache.java:787) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:666) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:473) BooleanWeight.java:385 always get the score with randomAccess=false, which again will ignore any IndexOrDocValuesQuery improvements. Weight#bulkScorer should probably use the scorerSupplier method instead of the scorer method internally.
        Hide
        jpountz Adrien Grand added a comment -

        Hi Martin, thanks for looking at the patch. It is expected that BulkScorer always gets a scorer with randomAccess=false since bulk scorers always need to return all matching documents. I don't think this is an issue.

        The stack trace that you shared happened because the query cache noticed that a BooleanQuery that is part of your query tree (it might be your top-level query) is being reused, and so it decides to build a cache entry for it (LRUQueryCache$CachingWrapperWeight.cache). Building a cache entry requires to consume all documents that match the query, so it makes sense that the scorer is created with randomAccess=false. Note that the fact that the boolean scorer is created with randomAccess=false does not necessarily mean that all scorers of sub clauses of the BooleanQuery will be created with randomAccess=false too: if the query is a conjunction then only the clauses that have the lowest cost will be created with randomAccess=false. As a consequence, the IndexOrDocValues optimization still applies if the range query is not the least costly clause.

        Show
        jpountz Adrien Grand added a comment - Hi Martin, thanks for looking at the patch. It is expected that BulkScorer always gets a scorer with randomAccess=false since bulk scorers always need to return all matching documents. I don't think this is an issue. The stack trace that you shared happened because the query cache noticed that a BooleanQuery that is part of your query tree (it might be your top-level query) is being reused, and so it decides to build a cache entry for it ( LRUQueryCache$CachingWrapperWeight.cache ). Building a cache entry requires to consume all documents that match the query, so it makes sense that the scorer is created with randomAccess=false . Note that the fact that the boolean scorer is created with randomAccess=false does not necessarily mean that all scorers of sub clauses of the BooleanQuery will be created with randomAccess=false too: if the query is a conjunction then only the clauses that have the lowest cost will be created with randomAccess=false . As a consequence, the IndexOrDocValues optimization still applies if the range query is not the least costly clause.
        Hide
        marumarutan Martin Amirault added a comment -

        I see. Thank you for checking !
        One last suggestion: Weight#scorer method seems quite prone to confusion with the Weight#scorerSupplier, and could lead to additionnal problems like it happened with the LRUQueryCache. Wouldnt it be safer to expose only Weight#scorerSupplier, and make the Weight#scorer method protected ?

        Show
        marumarutan Martin Amirault added a comment - I see. Thank you for checking ! One last suggestion: Weight#scorer method seems quite prone to confusion with the Weight#scorerSupplier , and could lead to additionnal problems like it happened with the LRUQueryCache. Wouldnt it be safer to expose only Weight#scorerSupplier , and make the Weight#scorer method protected ?
        Hide
        jpountz Adrien Grand added a comment -

        I agree it is a bit error-prone and removing the scorer method or hiding it would fix the issue. The way I have been approaching this issue is that it is a fairly new and experimental API so I'd like to keep it optional for now (ie. query impls do not have to implement this method). Now that this feature is released, hopefully we'll get feedback about how well it works, which will in-turn give us opportunities to improve this API and once we are more confident about it, we can think about removing the trap that you observed. For the record, this is my personal opinion and might not be shared by other people who work on this project.

        Show
        jpountz Adrien Grand added a comment - I agree it is a bit error-prone and removing the scorer method or hiding it would fix the issue. The way I have been approaching this issue is that it is a fairly new and experimental API so I'd like to keep it optional for now (ie. query impls do not have to implement this method). Now that this feature is released, hopefully we'll get feedback about how well it works, which will in-turn give us opportunities to improve this API and once we are more confident about it, we can think about removing the trap that you observed. For the record, this is my personal opinion and might not be shared by other people who work on this project.
        Hide
        marumarutan Martin Amirault added a comment -

        I agree, refactoring and api improvement should not be part of patch versions. Current code is good enough for the moment, and that could be improved as part of a more global refactoring in the future.

        Show
        marumarutan Martin Amirault added a comment - I agree, refactoring and api improvement should not be part of patch versions. Current code is good enough for the moment, and that could be improved as part of a more global refactoring in the future.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1ad6632e16c427ff9af1ddbd77d117db0d2e7c80 in lucene-solr's branch refs/heads/branch_6_5 from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1ad6632 ]

        LUCENE-7749: Made LRUQueryCache delegate the scoreSupplier method.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1ad6632e16c427ff9af1ddbd77d117db0d2e7c80 in lucene-solr's branch refs/heads/branch_6_5 from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1ad6632 ] LUCENE-7749 : Made LRUQueryCache delegate the scoreSupplier method.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit eb6a26cbfd1e92ba271ca82b23d2ff99adbc68f9 in lucene-solr's branch refs/heads/branch_6x from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=eb6a26c ]

        LUCENE-7749: Made LRUQueryCache delegate the scoreSupplier method.

        Show
        jira-bot ASF subversion and git services added a comment - Commit eb6a26cbfd1e92ba271ca82b23d2ff99adbc68f9 in lucene-solr's branch refs/heads/branch_6x from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=eb6a26c ] LUCENE-7749 : Made LRUQueryCache delegate the scoreSupplier method.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 2e545d78f5fe745905bcff19eb73a9a9faa4c032 in lucene-solr's branch refs/heads/master from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2e545d7 ]

        LUCENE-7749: Made LRUQueryCache delegate the scoreSupplier method.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 2e545d78f5fe745905bcff19eb73a9a9faa4c032 in lucene-solr's branch refs/heads/master from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2e545d7 ] LUCENE-7749 : Made LRUQueryCache delegate the scoreSupplier method.

          People

          • Assignee:
            Unassigned
            Reporter:
            marumarutan Martin Amirault
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development