Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-9145

OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in wrong order

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • indexing, jcr, lucene
    • Discovered while performing DAM searches in Adobe Experience Manager. 

    Description

      I believe OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in the wrong order.  WordDelimiterFilter is invoked with the GENERATE_WORD_PARTS flag, which splits camelCase/PascalCase into multiple terms, but since the LowerCaseFilter is applied first, the mixed-case is lost and the terms can't be split.

      Searching for savings, the damAssetLucene index (which uses the default OakAnalyzer) does not find an asset named savingsAccount.svg.

      Upon configuring the index's analyzers (/oak:index/damAssetLucene/analyzers) to apply WordDelimiterFilter before LowerCaseFilter, the correct behaviour was seen.

      {
        "jcr:primaryType": "nt:unstructured",
        "default": {
          "jcr:primaryType": "nt:unstructured",
          "tokenizer": {
            "jcr:primaryType": "nt:unstructured",
            "name": "Standard"
          },
          "filters": {
            "jcr:primaryType": "nt:unstructured",
            "WordDelimiter": {"jcr:primaryType": "nt:unstructured"},
            "LowerCase": {"jcr:primaryType": "nt:unstructured"}
          }
        }
      }
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            fortino Fabrizio Fortino
            dave.l.hughes Dave Hughes
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment