Uploaded image for project: 'Lucene.Net'
  1. Lucene.Net
  2. LUCENENET-337

TokenAttribute for Selectively Including Tokens in Length Norm

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Incomplete
    • None
    • Lucene.Net 3.6
    • Lucene.Net Core
    • None

    Description

      This patch adds functionality to Lucene.Net that allow a TokenFilter to mark a Token as not to be included in the length norm calculation through the use of a new TokenAttribute interface LengthNormAttribute and a corresponding implementation LengthNormAttributeImpl. This functionality is useful to prevent the increase of the length norm during synonym injection, particularly in cases where there are a large number of synonyms in relation to the number of original tokens.

      Following is an example of how to use the new attribute.

      Within your custom TokenFilter, define a field to persist a reference to the attribute and set it's value in the constructor. When a the stream advances to a new Token within the call to IncrementToken() the value of the IncludeInLengthNorm property of the attribute is set to false for Tokens which should not be included in the length norm calculation. It defaults to true and is reset to true after each Token is consumed within DocInverterPerField.ProcessFields.

      CustomTokenFilter.cs
      public class CustomTokenFilter : TokenFilter
      {
      	private LengthNormAttribute lnAttribute;
      	
      	public CustomTokenFilter(TokenStream input) : base(input)
      	{
      		this.lnAttribute = (LengthNormAttribute)AddAttribute(typeof(LengthNormAttribute));
      	}
      		
      	public override bool IncrementToken()
      	{
      		if (input.IncrementToken())
      		{
      			// make determination that the token is not to be 
      			// included in the length norm value
      			// this example marks all tokens to not be 
      			// included in the length norm value
      			this.lnAttribute.IncludeInLengthNorm = false;
      
      			return true;
      		}
      		else
      		{
      			return false;
      		}
      	}    
      }
      

      Attachments

        1. LengthNorm.patch
          11 kB
          Michael Garski

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            mgarski Michael Garski
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment