[LUCENE-1150] The token types of the standard tokenizer is not accessible - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3
Fix Version/s: 2.3.2, 2.4
Component/s: modules/analysis
Labels:
None

Lucene Fields:

New

Description

The StandardTokenizerImpl not being public, these token types are not accessible :

public static final int ALPHANUM          = 0;
public static final int APOSTROPHE        = 1;
public static final int ACRONYM           = 2;
public static final int COMPANY           = 3;
public static final int EMAIL             = 4;
public static final int HOST              = 5;
public static final int NUM               = 6;
public static final int CJ                = 7;
/**
 * @deprecated this solves a bug where HOSTs that end with '.' are identified
 *             as ACRONYMs. It is deprecated and will be removed in the next
 *             release.
 */
public static final int ACRONYM_DEP       = 8;

public static final String [] TOKEN_TYPES = new String [] {
    "<ALPHANUM>",
    "<APOSTROPHE>",
    "<ACRONYM>",
    "<COMPANY>",
    "<EMAIL>",
    "<HOST>",
    "<NUM>",
    "<CJ>",
    "<ACRONYM_DEP>"
};

So no custom TokenFilter can be based of the token type. Actually even the StandardFilter cannot be writen outside the org.apache.lucene.analysis.standard package.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-1150.patch
25/Jan/08 12:40
7 kB
Michael McCandless
LUCENE-1150.take2.patch
25/Jan/08 19:16
14 kB
Michael McCandless

Activity

People

Assignee:: Michael McCandless

Reporter:: Nicolas Lalevée

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 25/Jan/08 10:16

Updated:: 28/Aug/22 11:45

Resolved:: 15/Apr/08 09:09