Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5221

SimilarityBase.computeNorm is inconsistent with TFIDFSimilarity

    Details

    • Lucene Fields:
      New

      Description

      SimilarityBase.computeNorm Javadoc indicates that the doc length should be encoded in the same way as TFIDFSimilarity. However, when discountOverlaps is false, what gets encoded is SmallFloat.floatToByte315((boost / (float) Math.sqrt(docLen / boost))); rather than SmallFloat.floatToByte315((boost / (float) Math.sqrt(length))); due to the extra / state.getBoost() term in SimilarityBase.computeNorm:

      final float numTerms;
      if (discountOverlaps)
      numTerms = state.getLength() - state.getNumOverlap();
      else
      numTerms = state.getLength() / state.getBoost();
      return encodeNormValue(state.getBoost(), numTerms);

        Attachments

        1. LUCENE-5221.patch
          3 kB
          Robert Muir
        2. LUCENE-5221.patch
          0.6 kB
          Yubin Kim

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              shdwfeather Yubin Kim
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: