Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-1506

Add configurable filters and tokenizers

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • Jena 3.7.0
    • Jena 3.7.0
    • Text
    • None

    Description

      In support of Jena-1488, this issue proposes to add a feature to allow including defined filters and tokenizers, similar to DefinedAnalyzer, for the ConfigurableAnalyzer, allowing configurable arguments such as the excludeChars. I've looked at ConfigurableAnalyzer and its assembler and it should be straightforward.

      I would add tokenizer and filter definitions to TextIndexLucene similar to the support for adding analyzers:

          text:defineFilters (
              [ text:defineFilter <#foo> ; 
                text:filter [ 
                  a text:GenericFilter ;
                  text:class "fi.finto.FoldingFilter" ;
                  text:params (
                      [ text:paramName "excludeChars" ;
                        text:paramType text:TypeString ; 
                        text:paramValue "whatevercharstoexclude" ]
                      )
                  ] ; 
                ]
            )
      

      GenericFilterAssembler and GenericTokenizerAssmbler would make use of much of the code in GenericAnalyzerAssembler. The changes to ConfigurableAnalyzer and ConfigurableAnalyzerAssembler are straightforward and mostly involve retaining the resource URI rather than extracting the localName.

      Such an addition will make it easy to create new tokenizers and filters that could be dropped in by just adding the classes onto the jena/fuseki classpath or by referring to ones already included in Jena (via Lucene or otherwise) and putting the appropriate assembler bits in the configuration.

      Attachments

        Issue Links

          Activity

            People

              code-ferret Code Ferret
              code-ferret Code Ferret
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: