[LUCENE-1581] LowerCaseFilter should be able to be configured to use a specific locale. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Lucene Fields:

New

Description

//Since I am a .Net programmer, Sample codes will be in c# but I don't think that it would be a problem to understand them.
//

Assume an input text like "İ" and and analyzer like below

	public class SomeAnalyzer : Analyzer
    	{
		public override TokenStream TokenStream(string fieldName, System.IO.TextReader reader)
	        {
            		TokenStream t = new SomeTokenizer(reader);
		        t = new Lucene.Net.Analysis.ASCIIFoldingFilter(t);
			t = new LowerCaseFilter(t);
		        return t;
		}
        
    	}

ASCIIFoldingFilter will return "I" and after, LowerCaseFilter will return
"i" (if locale is "en-US")
or
"ı' if(locale is "tr-TR") (that means,this token should be input to another instance of ASCIIFoldingFilter)

So, calling LowerCaseFilter before ASCIIFoldingFilter would be a solution, but a better approach can be adding
a new constructor to LowerCaseFilter and forcing it to use a specific locale.

    public sealed class LowerCaseFilter : TokenFilter
    {
        /* +++ */System.Globalization.CultureInfo CultureInfo = System.Globalization.CultureInfo.CurrentCulture;

        public LowerCaseFilter(TokenStream in) : base(in)
        {
        }

        /* +++ */  public LowerCaseFilter(TokenStream in, System.Globalization.CultureInfo CultureInfo) : base(in)
        /* +++ */  {
        /* +++ */      this.CultureInfo = CultureInfo;
        /* +++ */  }
		
        public override Token Next(Token result)
        {
            result = Input.Next(result);
            if (result != null)
            {

                char[] buffer = result.TermBuffer();
                int length = result.termLength;
                for (int i = 0; i < length; i++)
                    /* +++ */ buffer[i] = System.Char.ToLower(buffer[i],CultureInfo);

                return result;
            }
            else
                return null;
        }
    }

DIGY

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

TestTurkishCollation.java
13/Jun/09 02:37
1 kB
Robert Muir

Activity

People

Assignee:: Unassigned

Reporter:: Digy

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 29/Mar/09 00:13

Updated:: 28/Aug/22 11:59

Resolved:: 01/Dec/09 20:41