Lucene - Core
  1. Lucene - Core
  2. LUCENE-662

Extendable writer and reader of field data

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/store
    • Labels:
      None

      Description

      As discussed on the dev mailing list, I have modified Lucene to allow to define how the data of a field is writen and read in the index.

      Basically, I have introduced the notion of IndexFormat. It is in fact a factory of FieldsWriter and FieldsReader. So the IndexReader, the indexWriter and the SegmentMerger are using this factory and not doing a "new FieldsReader/Writer()".

      I have also introduced the notion of FieldData. It handles every data of a field, and also the writing and the reading in a stream. I have done this way because in the current design of Lucene, Fiedable is an interface, so methods with a protected or package visibility cannot be defined.

      A FieldsWriter just writes data into a stream via the FieldData of the field.
      A FieldsReader instanciates a FieldData depending on the field name. Then it use the field data to read the stream. And finnaly it instanciates a Field with the field data.

      About compatibility, I think it is kept, as I have writen a DefaultIndexFormat that provides some DefaultFieldsWriter and DefaultFieldsReader. These implementations do the exact job that is done today.
      To acheive this modification, some classes and methods had to be moved from private and/or final to public or protected.

      About the lazy fields, I have implemented them in a more general way in the implementation of the abstract class FieldData, so it will be totally transparent for the Lucene user that will extends FieldData. The stream is kept in the fieldData and used as soon as the stringValue (or something else) is called. Implementing this way allowed me to handle the recently introduced LOAD_FOR_MERGE; it is just a lazy field data, and when read() is called on this lazy field data, the saved input stream is directly copied in the output stream.

      I have a last issue with this patch. The current design allow to read an index in an old format, and just do a writer.addIndexes() into a new format. With the new design, you cannot, because the writer will use the FieldData.write provided by the reader.

      enjoy !

      1. indexFormat.patch
        185 kB
        Nicolas Lalevée
      2. indexFormat-only.patch
        39 kB
        Nicolas Lalevée
      3. indexFormat.patch
        173 kB
        Nicolas Lalevée
      4. indexFormat.patch
        194 kB
        Nicolas Lalevée
      5. entrytable.patch
        43 kB
        Nicolas Lalevée
      6. generic-fieldIO-5.patch
        98 kB
        Nicolas Lalevée
      7. generic-fieldIO-4.patch
        151 kB
        Nicolas Lalevée
      8. generic-fieldIO-3.patch
        169 kB
        Nicolas Lalevée
      9. generic-fieldIO-2.patch
        163 kB
        Nicolas Lalevée
      10. ASF.LICENSE.NOT.GRANTED--generic-fieldIO.patch
        88 kB
        Nicolas Lalevée

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Unassigned
              Reporter:
              Nicolas Lalevée
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development