Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.3
-
None
-
New
Description
The StandardTokenizerImpl not being public, these token types are not accessible :
public static final int ALPHANUM = 0; public static final int APOSTROPHE = 1; public static final int ACRONYM = 2; public static final int COMPANY = 3; public static final int EMAIL = 4; public static final int HOST = 5; public static final int NUM = 6; public static final int CJ = 7; /** * @deprecated this solves a bug where HOSTs that end with '.' are identified * as ACRONYMs. It is deprecated and will be removed in the next * release. */ public static final int ACRONYM_DEP = 8; public static final String [] TOKEN_TYPES = new String [] { "<ALPHANUM>", "<APOSTROPHE>", "<ACRONYM>", "<COMPANY>", "<EMAIL>", "<HOST>", "<NUM>", "<CJ>", "<ACRONYM_DEP>" };
So no custom TokenFilter can be based of the token type. Actually even the StandardFilter cannot be writen outside the org.apache.lucene.analysis.standard package.