[JENA-776] LowerCaseKeywordAnalyzer for jena-text - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: Jena 2.12.1
Component/s: Text
Labels:
None

Description

I liked the option to specify Analyzer for jena-text, as implemented in ~~JENA-654~~. But I'd like to use an analyzer that is otherwise like KeywordAnalyzer but case-insensitive, for use in an autocomplete/typeahead UI widget. Lucene doesn't include such an analyzer, but there are several implementations of the same idea, e.g. in neo4j [1] and stargate [2].

I created my own implementation of such an analyzer and added code to use it from the assembler. Patch attached.

This analyzer is now in a new package org.apache.jena.query.text.analyzer, in case other analyzers for jena-text will appear in the future. If you don't like the new package, the class can of course be moved to org.apache.jena.query.text.

I also added a test for case-insensitivity. To avoid lots of duplicate boilerplate code, I slightly modified and subclassed the existing test for KeywordAnalyzer.

I'd love to see this in the next version of jena-text and Fuseki. Of course I'll rework the patch if necessary. I can also tweak the web documentation to mention this analyzer.

-Osma

[1] https://github.com/apatry/neo4j-lucene4-index/blob/master/src/main/java/org/neo4j/index/impl/lucene/LowerCaseKeywordAnalyzer.java

[2] https://github.com/tuplejump/stargate-core/blob/master/src/main/java/com/tuplejump/stargate/lucene/CaseInsensitiveKeywordAnalyzer.java