Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1581

LowerCaseFilter should be able to be configured to use a specific locale.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • None
    • None
    • New

    Description

      //Since I am a .Net programmer, Sample codes will be in c# but I don't think that it would be a problem to understand them.
      //

      Assume an input text like "İ" and and analyzer like below

      	public class SomeAnalyzer : Analyzer
          	{
      		public override TokenStream TokenStream(string fieldName, System.IO.TextReader reader)
      	        {
                  		TokenStream t = new SomeTokenizer(reader);
      		        t = new Lucene.Net.Analysis.ASCIIFoldingFilter(t);
      			t = new LowerCaseFilter(t);
      		        return t;
      		}
              
          	}
      

      ASCIIFoldingFilter will return "I" and after, LowerCaseFilter will return
      "i" (if locale is "en-US")
      or
      "ı' if(locale is "tr-TR") (that means,this token should be input to another instance of ASCIIFoldingFilter)

      So, calling LowerCaseFilter before ASCIIFoldingFilter would be a solution, but a better approach can be adding
      a new constructor to LowerCaseFilter and forcing it to use a specific locale.

          public sealed class LowerCaseFilter : TokenFilter
          {
              /* +++ */System.Globalization.CultureInfo CultureInfo = System.Globalization.CultureInfo.CurrentCulture;
      
              public LowerCaseFilter(TokenStream in) : base(in)
              {
              }
      
              /* +++ */  public LowerCaseFilter(TokenStream in, System.Globalization.CultureInfo CultureInfo) : base(in)
              /* +++ */  {
              /* +++ */      this.CultureInfo = CultureInfo;
              /* +++ */  }
      		
              public override Token Next(Token result)
              {
                  result = Input.Next(result);
                  if (result != null)
                  {
      
                      char[] buffer = result.TermBuffer();
                      int length = result.termLength;
                      for (int i = 0; i < length; i++)
                          /* +++ */ buffer[i] = System.Char.ToLower(buffer[i],CultureInfo);
      
                      return result;
                  }
                  else
                      return null;
              }
          }
      

      DIGY

      Attachments

        1. TestTurkishCollation.java
          1 kB
          Robert Muir

        Activity

          People

            Unassigned Unassigned
            digydigy Digy
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: