Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-3648

Use StandardTokenizer instead of ClassicTokenizer in OakAnalyzer

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.0.34, 1.2.19, 1.3.11, 1.4
    • lucene
    • None

    Description

      This is related to OAK-3276 where the intent was to use StandardAnalyzer by default (instead of OakAnalyzer). As discussed there, we need specific word delimiter which isn't possible with StandardAnalyzer, so we instead should switch over to StandardTokenizer in OakAnalyer itself.

      A few motivations to do that:

      • Better unicode support
      • ClassicTokenizer is the old (~lucene 3.1) implementation of standard tokenizer

      One of the key difference between classic and standard tokenizer is the way they delimit words (standard analyzer follows unicode text segmentation rules)... but that difference gets nullified as we have our own WordDelimiterFilter.

      Attachments

        Issue Links

          Activity

            People

              catholicon Vikas Saurabh
              catholicon Vikas Saurabh
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: