Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-9145

OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in wrong order

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: indexing, jcr, lucene
    • Environment:

      Discovered while performing DAM searches in Adobe Experience Manager. 

      Description

      I believe OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in the wrong order.  WordDelimiterFilter is invoked with the GENERATE_WORD_PARTS flag, which splits camelCase/PascalCase into multiple terms, but since the LowerCaseFilter is applied first, the mixed-case is lost and the terms can't be split.

      Searching for savings, the damAssetLucene index (which uses the default OakAnalyzer) does not find an asset named savingsAccount.svg.

      Upon configuring the index's analyzers (/oak:index/damAssetLucene/analyzers) to apply WordDelimiterFilter before LowerCaseFilter, the correct behaviour was seen.

      {
        "jcr:primaryType": "nt:unstructured",
        "default": {
          "jcr:primaryType": "nt:unstructured",
          "tokenizer": {
            "jcr:primaryType": "nt:unstructured",
            "name": "Standard"
          },
          "filters": {
            "jcr:primaryType": "nt:unstructured",
            "WordDelimiter": {"jcr:primaryType": "nt:unstructured"},
            "LowerCase": {"jcr:primaryType": "nt:unstructured"}
          }
        }
      }
      

        Attachments

          Activity

            People

            • Assignee:
              fortino Fabrizio Fortino
              Reporter:
              dave.l.hughes Dave Hughes
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: