Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: core/query/scoring
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Originally this was part of the patch for per-field Similarity (LUCENE-2236), but I pulled it
      out here as its own issue as its really mostly unrelated. I also like it as a separate issue
      to apply the deprecation to branch_3x to just make less surprises/migration hassles for 4.0 users.

      Currently Scorer takes a confusing number of ctors, either a Similarity, or a Weight + Similarity.
      Also, lots of scorers don't use the Similarity at all, and its not really needed in Scorer itself.
      Additionally, the Weight argument is often null. The Weight makes sense to be here in Scorer,
      its the parent that created the scorer, and used by Scorer itself to support LUCENE-2590's features.
      But I dont think all queries work with this feature correctly right now, because they pass null.

      Finally the situation gets confusing if you start to consider delegators like ScoreCachingWrapperScorer,
      which arent really delegating correctly so I'm unsure features like LUCENE-2590 aren't working with this.

      So I think we should remove the getSimilarity, if your scorer uses a Similarity its already coming
      to you via your ctor from your Weight and you can manage this yourself.

      Also, all scorers should pass the Weight (parent) that created them, and this should be Scorer's only ctor.
      I fixed all core/contrib/solr Scorers (even the internal ones) to pass their parent Weight, just for consistency
      of this visitor interface. The only one that passes null is Solr's ValueSourceScorer.

      I set fix-for 3.1, not because i want to backport anything, only to mark the getSimilarity deprecated there.

      1. LUCENE-2876.patch
        38 kB
        Robert Muir
      2. LUCENE-2876.patch
        42 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        I'd like to commit this later today if there aren't any objections (its just boring cleanup).

        As far as 3.1, i rethought the issue, and i think e.g. DisjunctionMaxQuery should really work with LUCENE-2590.

        So i'll look at fixing trying to pass non-null weight in 3.x too, i think users will see it as a bug...

        Show
        Robert Muir added a comment - I'd like to commit this later today if there aren't any objections (its just boring cleanup). As far as 3.1, i rethought the issue, and i think e.g. DisjunctionMaxQuery should really work with LUCENE-2590 . So i'll look at fixing trying to pass non-null weight in 3.x too, i think users will see it as a bug...
        Hide
        Doron Cohen added a comment -

        Patch looks good.

        +1 for this cleanup which removes calls with null arg and comment

        {"//similarity not in use"}

        .

        Some minor comments:

        • jdoc: in some of the scorer constructors a Weight param was added but existing jdocs for the costructor which document (some) params was not updated to also mention the weight. I am not 100% sure this should be fixed, as there is inconsistency in the level of jdoc for scorer implementations. So if there were no jdocs at all there I would say nothing, but since there were some now they became less complete...
        • ExactPhraseScorer is created with both Weight and Similarity - I think the Similarity param can be removed as part of this cleanup.
        • Same for SloppyPhraseScorer, PhraseScorer, SpanScorer, TermScorer, MatchAllDocsScorer - Similarity param can be removed.

        One question not related to this patch - just saw it reviewing:

        • it is interesting that SloppyPhraseScorer now extends PhraseScorer but ExactPhraseScorer does not, is this on purpose? Perhaps related do Mike's recent optimizations in this scorer?
        Show
        Doron Cohen added a comment - Patch looks good. +1 for this cleanup which removes calls with null arg and comment {"//similarity not in use"} . Some minor comments: jdoc: in some of the scorer constructors a Weight param was added but existing jdocs for the costructor which document (some) params was not updated to also mention the weight. I am not 100% sure this should be fixed, as there is inconsistency in the level of jdoc for scorer implementations. So if there were no jdocs at all there I would say nothing, but since there were some now they became less complete... ExactPhraseScorer is created with both Weight and Similarity - I think the Similarity param can be removed as part of this cleanup. Same for SloppyPhraseScorer, PhraseScorer, SpanScorer, TermScorer, MatchAllDocsScorer - Similarity param can be removed. One question not related to this patch - just saw it reviewing: it is interesting that SloppyPhraseScorer now extends PhraseScorer but ExactPhraseScorer does not, is this on purpose? Perhaps related do Mike's recent optimizations in this scorer?
        Hide
        Robert Muir added a comment -

        Thanks for the review Doron!

        jdoc: in some of the scorer constructors a Weight param was added but existing jdocs for the costructor which document (some) params was not updated to also mention the weight. I am not 100% sure this should be fixed, as there is inconsistency in the level of jdoc for scorer implementations. So if there were no jdocs at all there I would say nothing, but since there were some now they became less complete...

        I'll fix this, thanks!

        ExactPhraseScorer is created with both Weight and Similarity - I think the Similarity param can be removed as part of this cleanup.
        Same for SloppyPhraseScorer, PhraseScorer, SpanScorer, TermScorer, MatchAllDocsScorer - Similarity param can be removed.

        These still need the Similarity param? They use it in scoring, its just they don't pass it to the superclass constructor (Scorer's constructor).

        Its possible I misunderstood your idea though. Lets take the TermQuery example, are you suggesting that
        we should expose TermWeight's "Similarity" and just pass TermWeight to TermScorer (requiring TermScorer
        to take a TermWeight in its ctor instead of Weight + Similarity?) Currently TermWeight's local copy of
        Similarity, which it uses to compute IDF, is private.

        it is interesting that SloppyPhraseScorer now extends PhraseScorer but ExactPhraseScorer does not, is this on purpose? Perhaps related do Mike's recent optimizations in this scorer?

        Yes, that's correct. Just at a glance he might have done this so that ExactPhraseScorer can compute a
        score cache like TermScorer, and other similar optimizations since the "tf" values are really integers for
        this exact case.

        It might be that if we look at splitting calculations and matching out from Scorer, that we can make
        these "matchers" like ExactPhrase/SloppyPhrase simpler, and we could then clean up... not sure though!

        Show
        Robert Muir added a comment - Thanks for the review Doron! jdoc: in some of the scorer constructors a Weight param was added but existing jdocs for the costructor which document (some) params was not updated to also mention the weight. I am not 100% sure this should be fixed, as there is inconsistency in the level of jdoc for scorer implementations. So if there were no jdocs at all there I would say nothing, but since there were some now they became less complete... I'll fix this, thanks! ExactPhraseScorer is created with both Weight and Similarity - I think the Similarity param can be removed as part of this cleanup. Same for SloppyPhraseScorer, PhraseScorer, SpanScorer, TermScorer, MatchAllDocsScorer - Similarity param can be removed. These still need the Similarity param? They use it in scoring, its just they don't pass it to the superclass constructor (Scorer's constructor). Its possible I misunderstood your idea though. Lets take the TermQuery example, are you suggesting that we should expose TermWeight's "Similarity" and just pass TermWeight to TermScorer (requiring TermScorer to take a TermWeight in its ctor instead of Weight + Similarity?) Currently TermWeight's local copy of Similarity, which it uses to compute IDF, is private. it is interesting that SloppyPhraseScorer now extends PhraseScorer but ExactPhraseScorer does not, is this on purpose? Perhaps related do Mike's recent optimizations in this scorer? Yes, that's correct. Just at a glance he might have done this so that ExactPhraseScorer can compute a score cache like TermScorer, and other similar optimizations since the "tf" values are really integers for this exact case. It might be that if we look at splitting calculations and matching out from Scorer, that we can make these "matchers" like ExactPhrase/SloppyPhrase simpler, and we could then clean up... not sure though!
        Hide
        Doron Cohen added a comment -

        These still need the Similarity param? They use it in scoring, its just they don't pass it to the superclass constructor (Scorer's constructor).
        ... Currently TermWeight's local copy of Similarity, which it uses to compute IDF, is private.

        You're right, I was for some reason under the impression that part of the reason for the change is that Weight already exposes Similarity but it is not, and I think it shouldn't, so current patch is good here.

        Show
        Doron Cohen added a comment - These still need the Similarity param? They use it in scoring, its just they don't pass it to the superclass constructor (Scorer's constructor). ... Currently TermWeight's local copy of Similarity, which it uses to compute IDF, is private. You're right, I was for some reason under the impression that part of the reason for the change is that Weight already exposes Similarity but it is not, and I think it shouldn't, so current patch is good here.
        Hide
        Robert Muir added a comment -

        You're right, I was for some reason under the impression that part of the reason for the change is that Weight already exposes Similarity but it is not, and I think it shouldn't, so current patch is good here.

        Yes, mainly it was just to correct the inconsistencies:

        • the base class does nothing with Similarity, and its often null, so i don't think it needs to know about it.
        • the base class does however, provide functionality that depends on Weight, and I think it should be non-null to support that. We might want to expand on this functionality in the future too.

        I wish i could have thrown IAE on null Weight (I did this to find all scorers that did not comply), but there is again
        the one ValueSourceScorer in Solr that isn't easily fixed... at least the patch creates more consistency.

        I'll review all javadocs for the scorers, and try to make sure the param documentation is up to date.

        Thanks again for taking the time to review these scoring issues.

        Show
        Robert Muir added a comment - You're right, I was for some reason under the impression that part of the reason for the change is that Weight already exposes Similarity but it is not, and I think it shouldn't, so current patch is good here. Yes, mainly it was just to correct the inconsistencies: the base class does nothing with Similarity, and its often null, so i don't think it needs to know about it. the base class does however, provide functionality that depends on Weight, and I think it should be non-null to support that. We might want to expand on this functionality in the future too. I wish i could have thrown IAE on null Weight (I did this to find all scorers that did not comply), but there is again the one ValueSourceScorer in Solr that isn't easily fixed... at least the patch creates more consistency. I'll review all javadocs for the scorers, and try to make sure the param documentation is up to date. Thanks again for taking the time to review these scoring issues.
        Hide
        Robert Muir added a comment -

        Updated patch: with javadocs inconsistencies corrected.

        Show
        Robert Muir added a comment - Updated patch: with javadocs inconsistencies corrected.
        Hide
        Robert Muir added a comment -

        Committed revision 1061499.

        Will work on adding the @deprecated and fixing javadocs and null Weights in branch_3x,
        but we need to provide Similarity where we were providing it before for backwards compatibility.

        Show
        Robert Muir added a comment - Committed revision 1061499. Will work on adding the @deprecated and fixing javadocs and null Weights in branch_3x, but we need to provide Similarity where we were providing it before for backwards compatibility.
        Hide
        Robert Muir added a comment -

        backported to 3.x: revision 1061615.

        Show
        Robert Muir added a comment - backported to 3.x: revision 1061615.
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1

          People

          • Assignee:
            Robert Muir
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development