Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5221

SimilarityBase.computeNorm is inconsistent with TFIDFSimilarity

Details

    • New

    Description

      SimilarityBase.computeNorm Javadoc indicates that the doc length should be encoded in the same way as TFIDFSimilarity. However, when discountOverlaps is false, what gets encoded is SmallFloat.floatToByte315((boost / (float) Math.sqrt(docLen / boost))); rather than SmallFloat.floatToByte315((boost / (float) Math.sqrt(length))); due to the extra / state.getBoost() term in SimilarityBase.computeNorm:

      final float numTerms;
      if (discountOverlaps)
      numTerms = state.getLength() - state.getNumOverlap();
      else
      numTerms = state.getLength() / state.getBoost();
      return encodeNormValue(state.getBoost(), numTerms);

      Attachments

        1. LUCENE-5221.patch
          3 kB
          Robert Muir
        2. LUCENE-5221.patch
          0.6 kB
          Yubin Kim

        Activity

          People

            Unassigned Unassigned
            shdwfeather Yubin Kim
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: