Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-9145

OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in wrong order

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • indexing, jcr, lucene
    • Discovered while performing DAM searches in Adobe Experience Manager. 

    Description

      I believe OakAnalyzer applies LowerCaseFilter and WordDelimiterFilter in the wrong order.  WordDelimiterFilter is invoked with the GENERATE_WORD_PARTS flag, which splits camelCase/PascalCase into multiple terms, but since the LowerCaseFilter is applied first, the mixed-case is lost and the terms can't be split.

      Searching for savings, the damAssetLucene index (which uses the default OakAnalyzer) does not find an asset named savingsAccount.svg.

      Upon configuring the index's analyzers (/oak:index/damAssetLucene/analyzers) to apply WordDelimiterFilter before LowerCaseFilter, the correct behaviour was seen.

      {
        "jcr:primaryType": "nt:unstructured",
        "default": {
          "jcr:primaryType": "nt:unstructured",
          "tokenizer": {
            "jcr:primaryType": "nt:unstructured",
            "name": "Standard"
          },
          "filters": {
            "jcr:primaryType": "nt:unstructured",
            "WordDelimiter": {"jcr:primaryType": "nt:unstructured"},
            "LowerCase": {"jcr:primaryType": "nt:unstructured"}
          }
        }
      }
      

      Attachments

        Activity

          People

            fortino Fabrizio Fortino
            dave.l.hughes Dave Hughes
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: