Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-4123

ICUTokenizerFactory - per-script RBBI customization

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 4.1, 6.0
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      Initially this started out as an idea for a configuration knob on ICUTokenizer that would allow me to tell it not to tokenize on punctuation. Through IRC discussion on #lucene, it sorta ballooned. The committers had a long discussion about it that I don't really understand, so I'll be including it in the comments.

      I am a Solr user, so I would also need the ability to access the configuration from there, likely either in schema.xml or solrconfig.xml.

        Attachments

        1. SOLR-4123.patch
          5 kB
          Robert Muir
        2. SOLR-4123.patch
          20 kB
          Steve Rowe
        3. SOLR-4123.patch
          20 kB
          Steve Rowe
        4. SOLR-4123.patch
          22 kB
          Steve Rowe
        5. SOLR-4123.patch
          22 kB
          Steve Rowe

          Activity

            People

            • Assignee:
              steve_rowe Steve Rowe
              Reporter:
              elyograg Shawn Heisey
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: