Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0, 6.2
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Currently the terms are seeked eagerly in TermQuery.createWeight when creating the TermContext. This might be wasteful when scores are not needed since the query might be cached on some segments, thus seeking the term on these segments is not needed. We could change TermWeight to only seek terms in Weight.scorer when scores are not needed.

      1. LUCENE-7311.patch
        12 kB
        Adrien Grand
      2. LUCENE-7311.patch
        12 kB
        Adrien Grand

        Activity

        Hide
        jpountz Adrien Grand added a comment -

        Here is a patch.

        Show
        jpountz Adrien Grand added a comment - Here is a patch.
        Hide
        rcmuir Robert Muir added a comment -

        I think this is too invasive of an optimization? Already the term-handling code in these queries (and this is by far the simplest one) is too much, I think "something has to give" to make the world a simpler place before we decide to do these kinds of optimizations? For example removal of querynorm/classicsimilarity could really start to simplify Weight.

        Show
        rcmuir Robert Muir added a comment - I think this is too invasive of an optimization? Already the term-handling code in these queries (and this is by far the simplest one) is too much, I think "something has to give" to make the world a simpler place before we decide to do these kinds of optimizations? For example removal of querynorm/classicsimilarity could really start to simplify Weight.
        Hide
        jpountz Adrien Grand added a comment -

        I agree I am not much of a fan either, but opened this issue for discussion since it is something that I have seen a couple times already with users who have (very) large boolean filters that get reused. Another way to address these use-cases would be to have a tiny TermState cache on top of the terms dictionary, this way this optimization would be decoupled from the Weight implementations (the patch addresses TermQuery but it affects all queries that work on top of terms, like eg. phrase queries).

        Removing queryNorm is appealing regardless of this change.

        Show
        jpountz Adrien Grand added a comment - I agree I am not much of a fan either, but opened this issue for discussion since it is something that I have seen a couple times already with users who have (very) large boolean filters that get reused. Another way to address these use-cases would be to have a tiny TermState cache on top of the terms dictionary, this way this optimization would be decoupled from the Weight implementations (the patch addresses TermQuery but it affects all queries that work on top of terms, like eg. phrase queries). Removing queryNorm is appealing regardless of this change.
        Hide
        rcmuir Robert Muir added a comment -

        Well, i personally think TermQuery/TermWeight should be very easy to follow and understand. If its not: there is no hope for any of our other queries.

        I don't think we should explicitly optimize for abusive cases here, potentially causing bugs and making the code impossible to work with.

        If we can simplify/redesign Weight so that this stuff is more natural, then I think thats fine, but I think there are too many special cases in the code already.

        Some of these special cases might be easy to fix, just by doing some janitorial work. e.g. what is TermContext.hasOnlyRealTerms()? Do we still need this, is only to support autoprefix? Should we just remove autoprefix and these apis now that we have points?

        Show
        rcmuir Robert Muir added a comment - Well, i personally think TermQuery/TermWeight should be very easy to follow and understand. If its not: there is no hope for any of our other queries. I don't think we should explicitly optimize for abusive cases here, potentially causing bugs and making the code impossible to work with. If we can simplify/redesign Weight so that this stuff is more natural, then I think thats fine, but I think there are too many special cases in the code already. Some of these special cases might be easy to fix, just by doing some janitorial work. e.g. what is TermContext.hasOnlyRealTerms()? Do we still need this, is only to support autoprefix? Should we just remove autoprefix and these apis now that we have points?
        Hide
        jpountz Adrien Grand added a comment -

        +1 the patch was a bit trappy to write so I agree we should look at simplifying things first

        Show
        jpountz Adrien Grand added a comment - +1 the patch was a bit trappy to write so I agree we should look at simplifying things first
        Hide
        jpountz Adrien Grand added a comment -

        I updated the patch so that it can be applied on a recent master checkout.

        Things are a bit simpler since the first iteration on this issue since we removed aute-prefix terms, Weight.getValueForNormalization, Weight.normalize. Are we good to go now?

        Show
        jpountz Adrien Grand added a comment - I updated the patch so that it can be applied on a recent master checkout. Things are a bit simpler since the first iteration on this issue since we removed aute-prefix terms, Weight.getValueForNormalization, Weight.normalize. Are we good to go now?
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit d5779335aa39b39e520849ccc01782880a3cfaaf in lucene-solr's branch refs/heads/branch_6x from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d577933 ]

        LUCENE-7311: Cached term queries do not seek the terms dictionary anymore.

        Show
        jira-bot ASF subversion and git services added a comment - Commit d5779335aa39b39e520849ccc01782880a3cfaaf in lucene-solr's branch refs/heads/branch_6x from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d577933 ] LUCENE-7311 : Cached term queries do not seek the terms dictionary anymore.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 71541bcd6cfd1e279faa1f2402403ac74cc5362d in lucene-solr's branch refs/heads/master from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=71541bc ]

        LUCENE-7311: Cached term queries do not seek the terms dictionary anymore.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 71541bcd6cfd1e279faa1f2402403ac74cc5362d in lucene-solr's branch refs/heads/master from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=71541bc ] LUCENE-7311 : Cached term queries do not seek the terms dictionary anymore.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 71541bcd6cfd1e279faa1f2402403ac74cc5362d in lucene-solr's branch refs/heads/apiv2 from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=71541bc ]

        LUCENE-7311: Cached term queries do not seek the terms dictionary anymore.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 71541bcd6cfd1e279faa1f2402403ac74cc5362d in lucene-solr's branch refs/heads/apiv2 from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=71541bc ] LUCENE-7311 : Cached term queries do not seek the terms dictionary anymore.
        Hide
        mikemccand Michael McCandless added a comment -

        Bulk close resolved issues after 6.2.0 release.

        Show
        mikemccand Michael McCandless added a comment - Bulk close resolved issues after 6.2.0 release.

          People

          • Assignee:
            jpountz Adrien Grand
            Reporter:
            jpountz Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development