Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1150

The token types of the standard tokenizer is not accessible

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.3
    • 2.3.2, 2.4
    • modules/analysis
    • None
    • New

    Description

      The StandardTokenizerImpl not being public, these token types are not accessible :

      public static final int ALPHANUM          = 0;
      public static final int APOSTROPHE        = 1;
      public static final int ACRONYM           = 2;
      public static final int COMPANY           = 3;
      public static final int EMAIL             = 4;
      public static final int HOST              = 5;
      public static final int NUM               = 6;
      public static final int CJ                = 7;
      /**
       * @deprecated this solves a bug where HOSTs that end with '.' are identified
       *             as ACRONYMs. It is deprecated and will be removed in the next
       *             release.
       */
      public static final int ACRONYM_DEP       = 8;
      
      public static final String [] TOKEN_TYPES = new String [] {
          "<ALPHANUM>",
          "<APOSTROPHE>",
          "<ACRONYM>",
          "<COMPANY>",
          "<EMAIL>",
          "<HOST>",
          "<NUM>",
          "<CJ>",
          "<ACRONYM_DEP>"
      };
      

      So no custom TokenFilter can be based of the token type. Actually even the StandardFilter cannot be writen outside the org.apache.lucene.analysis.standard package.

      Attachments

        1. LUCENE-1150.patch
          7 kB
          Michael McCandless
        2. LUCENE-1150.take2.patch
          14 kB
          Michael McCandless

        Activity

          People

            mikemccand Michael McCandless
            hibou Nicolas Lalevée
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: