Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.0.0
-
None
-
None
-
New
Description
Some of our Japanese customers are reporting errors when performing searches using half width characters.
The desired behavior is that a document containing half width characters should be returned when performing a search using full width equivalents or when searching by the half width character itself.
Currently, a search will not return any matches for half width characters.
Here is a test case outlining desired behavior (this may require a new Analyzer).
public class TestJapaneseEncodings extends TestCase { byte[] fullWidthKa = new byte[]{(byte) 0xE3, (byte) 0x82, (byte) 0xAB}; byte[] halfWidthKa = new byte[]{(byte) 0xEF, (byte) 0xBD, (byte) 0xB6}; public void testAnalyzerWithHalfWidth() throws IOException { Reader r1 = new StringReader(makeHalfWidthKa()); TokenStream stream = new CJKAnalyzer().tokenStream("foo", r1); assertNotNull(stream); Token token = stream.next(); assertNotNull(token); assertEquals(makeFullWidthKa(), token.termText()); } public void testAnalyzerWithFullWidth() throws IOException { Reader r1 = new StringReader(makeFullWidthKa()); TokenStream stream = new CJKAnalyzer().tokenStream("foo", r1); assertEquals(makeFullWidthKa(), stream.next().termText()); } private String makeFullWidthKa() throws UnsupportedEncodingException { return new String(fullWidthKa, "UTF-8"); } private String makeHalfWidthKa() throws UnsupportedEncodingException { return new String(halfWidthKa, "UTF-8"); } }