Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5005

Length norm value of DefaultSimilarity for a few terms

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Not A Problem
    • Affects Version/s: 4.0
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      lengthNorm method of DefaultSimilarity is following:

        public float lengthNorm(FieldInvertState state) {
          final int numTerms;
          if (discountOverlaps)
            numTerms = state.getLength() - state.getNumOverlap();
          else
            numTerms = state.getLength();
         return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms)));
        }
      

      The retrun value is decided by (1.0 / Math.sqrt(numTerms)).
      The type is float, but this value is encoded to byte length by SmallFloat.floatToByte315.

      term count 1/sqrt(numTerms) 1/sqrt(numTerms) to byte
      1 1.000000 1.0000
      2 0.707107 0.6250
      3 0.577350 0.5000
      4 0.500000 0.5000
      5 0.447214 0.4375

      The length norm of 3 terms is the same as that of 4 terms.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sasashin Shingo Sasaki
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: