Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-4123

ICUTokenizerFactory - per-script RBBI customization

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 4.0
    • 4.1, 6.0
    • Schema and Analysis
    • None

    Description

      Initially this started out as an idea for a configuration knob on ICUTokenizer that would allow me to tell it not to tokenize on punctuation. Through IRC discussion on #lucene, it sorta ballooned. The committers had a long discussion about it that I don't really understand, so I'll be including it in the comments.

      I am a Solr user, so I would also need the ability to access the configuration from there, likely either in schema.xml or solrconfig.xml.

      Attachments

        1. SOLR-4123.patch
          22 kB
          Steven Rowe
        2. SOLR-4123.patch
          22 kB
          Steven Rowe
        3. SOLR-4123.patch
          20 kB
          Steven Rowe
        4. SOLR-4123.patch
          20 kB
          Steven Rowe
        5. SOLR-4123.patch
          5 kB
          Robert Muir

        Activity

          People

            sarowe Steven Rowe
            elyograg Shawn Heisey
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: