Lucene - Core
  1. Lucene - Core
  2. LUCENE-3208

Move Query.weight() to IndexSearcher as protected method

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3, 4.0-ALPHA
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We had this issue several times, latest in LUCENE-3207.

      The method Query.weight() was left in Query for backwards reasons in Lucene 2.9 when we changed Weight class. This method is only to be called on top-level queries - and this is done by IndexSearcher. This method is just a utility method, that has nothing to do with the query itsself (it just combines the createWeight method and calls the normalization afterwards).

      The problem we have is that any query that wraps other queries (like CustomScore, ConstantScore, Boolean) calls Query.weight() instead of Query.createWeight(), it will do normalization two times, leading to strange bugs.

      For 3.3 I will make Query.weight() simply delegate to IndexSearcher's replacement method with a big deprecation warning, so user sees this. In IndexSearcher itsself the method will be protected to only be called by itsself or subclasses of IndexSearcher. Delegation for backwards is no problem, as protected is accessible by classes in same package.

      I would suggest the method name to be IndexSearcher.createNormalizedWeight(Query q)

      1. LUCENE-3208.patch
        17 kB
        Uwe Schindler
      2. LUCENE-3208.patch
        19 kB
        Uwe Schindler
      3. LUCENE-3208-LTC.patch
        3 kB
        Uwe Schindler
      4. LUCENE-3208.patch
        25 kB
        Uwe Schindler
      5. LUCENE-3208-3x.patch
        34 kB
        Uwe Schindler
      6. LUCENE-3208-3x.patch
        34 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          +1

          Show
          Robert Muir added a comment - +1
          Hide
          Simon Willnauer added a comment -

          +1

          Show
          Simon Willnauer added a comment - +1
          Hide
          Uwe Schindler added a comment -

          I started to rewrite some stuff, very straightforward.

          • BufferedDeletesStream has to be changed as it was also calling Query.weight, but I replaced the usage here by QueryWrapperFilter and getting the DocIdSet. Code gets much easier here.
          • QueryWrapperFilter's hack was rewritten, easy
          • in TestFrameWork, QueryUtils were also rewritten, they often use weight, but thats internal only.

          The main issue:
          In IndexSearcher is already a method called createWeight(Query) (which currently delegates to the Query). I moved the code over here. I have to still complain about the name, it creates a Weight yes, but it should also note that it rewrites and normalizes the weight. So I would like to rename that method, too and deprecate the old one.

          For now I leave the name unchanged. Patch comes soon (core only).

          Show
          Uwe Schindler added a comment - I started to rewrite some stuff, very straightforward. BufferedDeletesStream has to be changed as it was also calling Query.weight, but I replaced the usage here by QueryWrapperFilter and getting the DocIdSet. Code gets much easier here. QueryWrapperFilter's hack was rewritten, easy in TestFrameWork, QueryUtils were also rewritten, they often use weight, but thats internal only. The main issue: In IndexSearcher is already a method called createWeight(Query) (which currently delegates to the Query). I moved the code over here. I have to still complain about the name, it creates a Weight yes, but it should also note that it rewrites and normalizes the weight. So I would like to rename that method, too and deprecate the old one. For now I leave the name unchanged. Patch comes soon (core only).
          Hide
          Uwe Schindler added a comment -

          First patch with some minor hacks and 2 disabled tests.

          The problems:

          • 2 tests are in Span Package and call the IndexSearcher.createWeight method, which is protected. I commented them out for now
          • QueryValueSource is still to be investigated, I dont completely understand how it works, it looks fine for now (tests pass) but I have to follow how it works. Maybe Yonik can help. There was also a bug in fcontext preventing caching the weight. There is a reflection hack in it for now (nocommit)

          This patch also fixes:

          • Solr's BoostedQuery was fixed, too

          I still dont like the name of the protected method in IndexSearcher createWeight(), as it does more than that. It rewrites, creates the weight and then normalizes the query. I would like to rename it and make it maybe public, but expert only.

          For 3.x I would do the rename, too, and use VirtualMethod to fix invocations by 3rd party code if overridden.

          Show
          Uwe Schindler added a comment - First patch with some minor hacks and 2 disabled tests. The problems: 2 tests are in Span Package and call the IndexSearcher.createWeight method, which is protected. I commented them out for now QueryValueSource is still to be investigated, I dont completely understand how it works, it looks fine for now (tests pass) but I have to follow how it works. Maybe Yonik can help. There was also a bug in fcontext preventing caching the weight. There is a reflection hack in it for now (nocommit) This patch also fixes: Solr's BoostedQuery was fixed, too I still dont like the name of the protected method in IndexSearcher createWeight(), as it does more than that. It rewrites, creates the weight and then normalizes the query. I would like to rename it and make it maybe public, but expert only. For 3.x I would do the rename, too, and use VirtualMethod to fix invocations by 3rd party code if overridden.
          Hide
          Michael McCandless added a comment -

          +1

          Show
          Michael McCandless added a comment - +1
          Hide
          Uwe Schindler added a comment -

          Here the patch with IndexSearcher.createWeigth renamed to createNormalizedWeight() and public/expert, so Solr can access it and custom search code.

          I am currently thinking about a possibility to check that each Weight is only normaliized one time, possibly using setOnce(). Its not easy to do, maybe wrap the Weight returned by the IndexSearcher method using a WrappedWeight that throws UOE on normalize,

          Show
          Uwe Schindler added a comment - Here the patch with IndexSearcher.createWeigth renamed to createNormalizedWeight() and public/expert, so Solr can access it and custom search code. I am currently thinking about a possibility to check that each Weight is only normaliized one time, possibly using setOnce(). Its not easy to do, maybe wrap the Weight returned by the IndexSearcher method using a WrappedWeight that throws UOE on normalize,
          Hide
          Yonik Seeley added a comment -

          +1, looks good!
          Doesn't seem like it's worth the trouble to catch Weight being normalized more than once. I'd say just commit this as is.

          Show
          Yonik Seeley added a comment - +1, looks good! Doesn't seem like it's worth the trouble to catch Weight being normalized more than once. I'd say just commit this as is.
          Hide
          Robert Muir added a comment -

          i think its worth the trouble, if we can do it.

          we shouldnt rely upon the fact that getting sumOfSquaredWeights in some of these weights currently has side effects and sometimes is just wasted computation.

          other times it creates wrong scores.

          Show
          Robert Muir added a comment - i think its worth the trouble, if we can do it. we shouldnt rely upon the fact that getting sumOfSquaredWeights in some of these weights currently has side effects and sometimes is just wasted computation. other times it creates wrong scores.
          Hide
          Yonik Seeley added a comment -

          I think it's worth to do it in tests... but not as part of the public API (weighting yourself is expert level only and most people don't do it). Wrapping every weight just makes things uglier, esp if you want to do something with the produced weight.

          Show
          Yonik Seeley added a comment - I think it's worth to do it in tests... but not as part of the public API (weighting yourself is expert level only and most people don't do it). Wrapping every weight just makes things uglier, esp if you want to do something with the produced weight.
          Hide
          Uwe Schindler added a comment -

          A second idea would be that LuceneTestCase.newSearcher() returns such a Searcher, that wraps and disallows this. We have other helper classes like MockDirectory asserting similar things.

          I am currently thinking about coding this, its just a few lines.

          Show
          Uwe Schindler added a comment - A second idea would be that LuceneTestCase.newSearcher() returns such a Searcher, that wraps and disallows this. We have other helper classes like MockDirectory asserting similar things. I am currently thinking about coding this, its just a few lines.
          Hide
          Robert Muir added a comment -

          Wrapping every weight just makes things uglier, esp if you want to do something with the produced weight.

          It doesn't have to be done this way necessarily. Personally i would be happy if TermWeight had a boolean 'normalized' (used only for asserting) and an assert.

          it doesn't have to be totally perfect, but, I refuse to debug this issue again.

          If its not done here, I will open a blocker issue!

          Show
          Robert Muir added a comment - Wrapping every weight just makes things uglier, esp if you want to do something with the produced weight. It doesn't have to be done this way necessarily. Personally i would be happy if TermWeight had a boolean 'normalized' (used only for asserting) and an assert. it doesn't have to be totally perfect, but, I refuse to debug this issue again. If its not done here, I will open a blocker issue!
          Hide
          Uwe Schindler added a comment -

          Here is my idea to enforce one-time normalizing and prevent side-effects during tests.

          Show
          Uwe Schindler added a comment - Here is my idea to enforce one-time normalizing and prevent side-effects during tests.
          Hide
          Robert Muir added a comment -

          Great, Uwe, I'm satisfied.

          Sorry for being so vocal about this, but i wasted many hours on this stupid bug (I know you did before, too), and the bug is not very friendly to people that debug with System.out.println, you don't catch it until you pull out enough of your hair to start using Thread.dumpStack...

          Show
          Robert Muir added a comment - Great, Uwe, I'm satisfied. Sorry for being so vocal about this, but i wasted many hours on this stupid bug (I know you did before, too), and the bug is not very friendly to people that debug with System.out.println, you don't catch it until you pull out enough of your hair to start using Thread.dumpStack...
          Hide
          Uwe Schindler added a comment -

          New patch:

          • Added AssertingIndexReader in test-framework, this one ensures that weights are only normalized once when this is done by IndexSearcher. This class can be extended to add further checks
          • As suggested by Yonik, changes the key used for fContext in the QueryValueSource to be the valuesource itsself. The backup code cannot be removed, there is somewhere a bug (new issue)

          All tests pass. I would like to commit this to trunk soon and then add sophisticated backwards for 3.x

          Show
          Uwe Schindler added a comment - New patch: Added AssertingIndexReader in test-framework, this one ensures that weights are only normalized once when this is done by IndexSearcher. This class can be extended to add further checks As suggested by Yonik, changes the key used for fContext in the QueryValueSource to be the valuesource itsself. The backup code cannot be removed, there is somewhere a bug (new issue) All tests pass. I would like to commit this to trunk soon and then add sophisticated backwards for 3.x
          Hide
          Uwe Schindler added a comment -

          Committed trunk revision: 1136568

          Now backporting and adding sophisticated backwards...

          Show
          Uwe Schindler added a comment - Committed trunk revision: 1136568 Now backporting and adding sophisticated backwards...
          Hide
          Uwe Schindler added a comment -

          Missed a change in the new grouping module: Trunk revision: 1136605

          Show
          Uwe Schindler added a comment - Missed a change in the new grouping module: Trunk revision: 1136605
          Hide
          Uwe Schindler added a comment -

          Patch for 3.x branch. To apply, copy the trunk's AssertingIndexSearcher first to its target dir and then apply patch.

          Show
          Uwe Schindler added a comment - Patch for 3.x branch. To apply, copy the trunk's AssertingIndexSearcher first to its target dir and then apply patch.
          Hide
          Uwe Schindler added a comment -

          Thanks to Robert for helping me to find the bug in TestSimilaroity. It was caused by a copypaste error when nuking Searcher. Searcher.java and also IndexSearcher.java had a private similarity field and separate setters. Methods implemented by Searcher would therefore not see changes on Similarity done on the IndexSearcher.

          Show
          Uwe Schindler added a comment - Thanks to Robert for helping me to find the bug in TestSimilaroity. It was caused by a copypaste error when nuking Searcher. Searcher.java and also IndexSearcher.java had a private similarity field and separate setters. Methods implemented by Searcher would therefore not see changes on Similarity done on the IndexSearcher.
          Hide
          Uwe Schindler added a comment -

          Committed 3.x revision: 1136702

          Show
          Uwe Schindler added a comment - Committed 3.x revision: 1136702
          Hide
          Robert Muir added a comment -

          the backport looks good, and important/scary to also fix this IndexSearcher/Searcher bug.

          Show
          Robert Muir added a comment - the backport looks good, and important/scary to also fix this IndexSearcher/Searcher bug.
          Hide
          Robert Muir added a comment -

          bulk close for 3.3

          Show
          Robert Muir added a comment - bulk close for 3.3

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development