Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-776

LowerCaseKeywordAnalyzer for jena-text

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Done
    • None
    • Jena 2.12.1
    • Text
    • None

    Description

      I liked the option to specify Analyzer for jena-text, as implemented in JENA-654. But I'd like to use an analyzer that is otherwise like KeywordAnalyzer but case-insensitive, for use in an autocomplete/typeahead UI widget. Lucene doesn't include such an analyzer, but there are several implementations of the same idea, e.g. in neo4j [1] and stargate [2].

      I created my own implementation of such an analyzer and added code to use it from the assembler. Patch attached.

      This analyzer is now in a new package org.apache.jena.query.text.analyzer, in case other analyzers for jena-text will appear in the future. If you don't like the new package, the class can of course be moved to org.apache.jena.query.text.

      I also added a test for case-insensitivity. To avoid lots of duplicate boilerplate code, I slightly modified and subclassed the existing test for KeywordAnalyzer.

      I'd love to see this in the next version of jena-text and Fuseki. Of course I'll rework the patch if necessary. I can also tweak the web documentation to mention this analyzer.

      -Osma

      [1] https://github.com/apatry/neo4j-lucene4-index/blob/master/src/main/java/org/neo4j/index/impl/lucene/LowerCaseKeywordAnalyzer.java

      [2] https://github.com/tuplejump/stargate-core/blob/master/src/main/java/com/tuplejump/stargate/lucene/CaseInsensitiveKeywordAnalyzer.java

      Attachments

        Activity

          People

            andy Andy Seaborne
            osma Osma Suominen
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: