Lucene - Core
  1. Lucene - Core
  2. LUCENE-2309

Fully decouple IndexWriter from analyzers

    Details

    • Lucene Fields:
      New

      Description

      IndexWriter only needs an AttributeSource to do indexing.

      Yet, today, it interacts with Field instances, holds a private
      analyzers, invokes analyzer.reusableTokenStream, has to deal with a
      wide variety (it's not analyzed; it is analyzed but it's a Reader,
      String; it's pre-analyzed).

      I'd like to have IW only interact with attr sources that already
      arrived with the fields. This would be a powerful decoupling – it
      means others are free to make their own attr sources.

      They need not even use any of Lucene's analysis impls; eg they can
      integrate to other things like OpenPipeline.
      Or make something completely custom.

      LUCENE-2302 is already a big step towards this: it makes IW agnostic
      about which attr is "the term", and only requires that it provide a
      BytesRef (for flex).

      Then I think LUCENE-2308 would get us most of the remaining way – ie, if the
      FieldType knows the analyzer to use, then we could simply create a
      getAttrSource() method (say) on it and move all the logic IW has today
      onto there. (We'd still need existing IW code for back-compat).

      1. LUCENE-2309-getTSFromField.patch
        19 kB
        Chris Male
      2. LUCENE-2309-analyzer-based.patch
        13 kB
        Chris Male
      3. LUCENE-2309.patch
        14 kB
        Chris Male

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Chris Male
            Reporter:
            Michael McCandless
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development