Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5005

Length norm value of DefaultSimilarity for a few terms

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Not A Problem
    • 4.0
    • None
    • core/search
    • None
    • New

    Description

      lengthNorm method of DefaultSimilarity is following:

        public float lengthNorm(FieldInvertState state) {
          final int numTerms;
          if (discountOverlaps)
            numTerms = state.getLength() - state.getNumOverlap();
          else
            numTerms = state.getLength();
         return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms)));
        }
      

      The retrun value is decided by (1.0 / Math.sqrt(numTerms)).
      The type is float, but this value is encoded to byte length by SmallFloat.floatToByte315.

      term count 1/sqrt(numTerms) 1/sqrt(numTerms) to byte
      1 1.000000 1.0000
      2 0.707107 0.6250
      3 0.577350 0.5000
      4 0.500000 0.5000
      5 0.447214 0.4375

      The length norm of 3 terms is the same as that of 4 terms.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sasashin Shingo Sasaki
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: