Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-3648

Use StandardTokenizer instead of ClassicTokenizer in OakAnalyzer

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0.34, 1.2.19, 1.3.11, 1.4
    • Component/s: lucene
    • Labels:
      None

      Description

      This is related to OAK-3276 where the intent was to use StandardAnalyzer by default (instead of OakAnalyzer). As discussed there, we need specific word delimiter which isn't possible with StandardAnalyzer, so we instead should switch over to StandardTokenizer in OakAnalyer itself.

      A few motivations to do that:

      • Better unicode support
      • ClassicTokenizer is the old (~lucene 3.1) implementation of standard tokenizer

      One of the key difference between classic and standard tokenizer is the way they delimit words (standard analyzer follows unicode text segmentation rules)... but that difference gets nullified as we have our own WordDelimiterFilter.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                catholicon Vikas Saurabh
                Reporter:
                catholicon Vikas Saurabh
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: