Lucene - Core
  1. Lucene - Core
  2. LUCENE-5221

SimilarityBase.computeNorm is inconsistent with TFIDFSimilarity

    Details

    • Lucene Fields:
      New

      Description

      SimilarityBase.computeNorm Javadoc indicates that the doc length should be encoded in the same way as TFIDFSimilarity. However, when discountOverlaps is false, what gets encoded is SmallFloat.floatToByte315((boost / (float) Math.sqrt(docLen / boost))); rather than SmallFloat.floatToByte315((boost / (float) Math.sqrt(length))); due to the extra / state.getBoost() term in SimilarityBase.computeNorm:

      final float numTerms;
      if (discountOverlaps)
      numTerms = state.getLength() - state.getNumOverlap();
      else
      numTerms = state.getLength() / state.getBoost();
      return encodeNormValue(state.getBoost(), numTerms);

      1. LUCENE-5221.patch
        3 kB
        Robert Muir
      2. LUCENE-5221.patch
        0.6 kB
        Yubin Kim

        Activity

        Hide
        Robert Muir added a comment -

        +1, thank you for reporting this!

        Show
        Robert Muir added a comment - +1, thank you for reporting this!
        Hide
        Yubin Kim added a comment -

        Here's the patch.

        Show
        Yubin Kim added a comment - Here's the patch.
        Hide
        Robert Muir added a comment -

        Thanks again: I added a test for this to your patch.

        Show
        Robert Muir added a comment - Thanks again: I added a test for this to your patch.
        Hide
        ASF subversion and git services added a comment -

        Commit 1524457 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1524457 ]

        LUCENE-5221: SimilarityBase.computeNorm is inconsistent with TFIDFSimilarity

        Show
        ASF subversion and git services added a comment - Commit 1524457 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1524457 ] LUCENE-5221 : SimilarityBase.computeNorm is inconsistent with TFIDFSimilarity
        Hide
        ASF subversion and git services added a comment -

        Commit 1524463 from Robert Muir in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1524463 ]

        LUCENE-5221: SimilarityBase.computeNorm is inconsistent with TFIDFSimilarity

        Show
        ASF subversion and git services added a comment - Commit 1524463 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1524463 ] LUCENE-5221 : SimilarityBase.computeNorm is inconsistent with TFIDFSimilarity
        Hide
        Yubin Kim added a comment -

        Thanks for the speedy response and commit!

        Show
        Yubin Kim added a comment - Thanks for the speedy response and commit!
        Hide
        ASF subversion and git services added a comment -

        Commit 1524467 from Robert Muir in branch 'dev/branches/lucene_solr_4_5'
        [ https://svn.apache.org/r1524467 ]

        LUCENE-5221: SimilarityBase.computeNorm is inconsistent with TFIDFSimilarity

        Show
        ASF subversion and git services added a comment - Commit 1524467 from Robert Muir in branch 'dev/branches/lucene_solr_4_5' [ https://svn.apache.org/r1524467 ] LUCENE-5221 : SimilarityBase.computeNorm is inconsistent with TFIDFSimilarity
        Hide
        Adrien Grand added a comment -

        4.5 release -> bulk close

        Show
        Adrien Grand added a comment - 4.5 release -> bulk close

          People

          • Assignee:
            Unassigned
            Reporter:
            Yubin Kim
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development