Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8240

Make TokenStreamComponents.setReader public

Details

    • Wish
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • modules/analysis
    • None
    • New

    Description

      The simplest change for this would be to make TokenStreamComponents.setReader() public. Another alternative would be to provide a SubFieldAnalyzer along the lines of what is attached, although for reasons given below I think this implementation is a little hacky and would ideally be supported in a different way before making that part of a public Lucene API.

      Exposing this method would allow a third-party extension to access it in order to wrap TokenStreamComponents. My use case is a SubFieldAnalyzer (attached, for reference) that applies different analysis to different instances of a field. This supports a big "catch-all" field that has different (index-time) text processing. The way we implement that is by creating a TokenStreamComponents that wraps separate per-subfield components and switches among them when setReader() is called.

      Why setReader()? This is the only part of the API where we can inject this notion of subfields. setReader() is called with a Reader for each field instance, and we supply a special Reader that identifies its subfield.

      This is a bit hacky – ideally subfields would be first-class citizens in the Analyzer API, so eg there would be methods like Analyzer.createComponents(String fieldName, String subFieldName), etc. However this seems like a pretty big change for an experimental feature, so it seems like an OK tradeoff to live with the Reader-per-subfield hack for now.

      Currently SubFieldAnalyzer has to live in org.apache.lucene.analysis package in order to call TokenStreamComponents.setReader (on a separate instance) and propitiate java's code-hiding rules, which is awkward.

      Attachments

        1. SubFieldAnalyzer.java
          5 kB
          Michael Sokolov

        Activity

          People

            Unassigned Unassigned
            sokolov Michael Sokolov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment