Lucene - Core
  1. Lucene - Core
  2. LUCENE-662

Extendable writer and reader of field data

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/store
    • Labels:
      None

      Description

      As discussed on the dev mailing list, I have modified Lucene to allow to define how the data of a field is writen and read in the index.

      Basically, I have introduced the notion of IndexFormat. It is in fact a factory of FieldsWriter and FieldsReader. So the IndexReader, the indexWriter and the SegmentMerger are using this factory and not doing a "new FieldsReader/Writer()".

      I have also introduced the notion of FieldData. It handles every data of a field, and also the writing and the reading in a stream. I have done this way because in the current design of Lucene, Fiedable is an interface, so methods with a protected or package visibility cannot be defined.

      A FieldsWriter just writes data into a stream via the FieldData of the field.
      A FieldsReader instanciates a FieldData depending on the field name. Then it use the field data to read the stream. And finnaly it instanciates a Field with the field data.

      About compatibility, I think it is kept, as I have writen a DefaultIndexFormat that provides some DefaultFieldsWriter and DefaultFieldsReader. These implementations do the exact job that is done today.
      To acheive this modification, some classes and methods had to be moved from private and/or final to public or protected.

      About the lazy fields, I have implemented them in a more general way in the implementation of the abstract class FieldData, so it will be totally transparent for the Lucene user that will extends FieldData. The stream is kept in the fieldData and used as soon as the stringValue (or something else) is called. Implementing this way allowed me to handle the recently introduced LOAD_FOR_MERGE; it is just a lazy field data, and when read() is called on this lazy field data, the saved input stream is directly copied in the output stream.

      I have a last issue with this patch. The current design allow to read an index in an old format, and just do a writer.addIndexes() into a new format. With the new design, you cannot, because the writer will use the FieldData.write provided by the reader.

      enjoy !

      1. indexFormat-only.patch
        39 kB
        Nicolas Lalevée
      2. indexFormat.patch
        194 kB
        Nicolas Lalevée
      3. indexFormat.patch
        173 kB
        Nicolas Lalevée
      4. indexFormat.patch
        185 kB
        Nicolas Lalevée
      5. generic-fieldIO-5.patch
        98 kB
        Nicolas Lalevée
      6. generic-fieldIO-4.patch
        151 kB
        Nicolas Lalevée
      7. generic-fieldIO-3.patch
        169 kB
        Nicolas Lalevée
      8. generic-fieldIO-2.patch
        163 kB
        Nicolas Lalevée
      9. entrytable.patch
        43 kB
        Nicolas Lalevée
      10. ASF.LICENSE.NOT.GRANTED--generic-fieldIO.patch
        88 kB
        Nicolas Lalevée

        Issue Links

          Activity

          Hide
          Grant Ingersoll added a comment -

          Marking as won't fix, as I think the new flex indexing stuff takes care of this.

          Show
          Grant Ingersoll added a comment - Marking as won't fix, as I think the new flex indexing stuff takes care of this.
          Hide
          Nicolas Lalevée added a comment -

          Synchronized with the trunk, so with the payload feature. It allowed me to refactor in one class the payload writing which is in two places today : it is now in the DefaultPostingWriter class.

          From my last update, the TODO list is still to do, nothing has been fixed. Furthermore there is a regression in the new patch : the ensureOpen() is not correctly handled for lazy loaded fields : a test fail. This is due to the fact that the FieldsReader doesn't handle it anymore in my patch. As the data struture can be customized, lazy loading is exported to the FieldData created by the FieldsReader. So the both instance have to communicate about the closing of the streams. So a new item in the TODO list.

          As discussed in java-dev, here is a light patch with only the index format handling, without the possibility to redefine how data and postings are store/retreived.

          Show
          Nicolas Lalevée added a comment - Synchronized with the trunk, so with the payload feature. It allowed me to refactor in one class the payload writing which is in two places today : it is now in the DefaultPostingWriter class. From my last update, the TODO list is still to do, nothing has been fixed. Furthermore there is a regression in the new patch : the ensureOpen() is not correctly handled for lazy loaded fields : a test fail. This is due to the fact that the FieldsReader doesn't handle it anymore in my patch. As the data struture can be customized, lazy loading is exported to the FieldData created by the FieldsReader. So the both instance have to communicate about the closing of the streams. So a new item in the TODO list. As discussed in java-dev, here is a light patch with only the index format handling, without the possibility to redefine how data and postings are store/retreived.
          Hide
          Nicolas Lalevée added a comment -

          Patch updated and synchornized with the trunk r517330.
          I have removed the "svn mv" I have done so now the patch is applying fine on a fresh trunk. The svn mv was just about creating the package impl. So everthing came back to o.a.l.index.

          Note about the last commit in trunk I have merged : lazy loading of the "proxstream". The feature is lost within this patch. I didn't took time to merge it properly. I think this is hightly feasable, but not just done. So a new item on the TODO list.

          Show
          Nicolas Lalevée added a comment - Patch updated and synchornized with the trunk r517330. I have removed the "svn mv" I have done so now the patch is applying fine on a fresh trunk. The svn mv was just about creating the package impl. So everthing came back to o.a.l.index. Note about the last commit in trunk I have merged : lazy loading of the "proxstream". The feature is lost within this patch. I didn't took time to merge it properly. I think this is hightly feasable, but not just done. So a new item on the TODO list.
          Hide
          Nicolas Lalevée added a comment -

          Hum... same here.... This is due to some svn mv, and I created the patch with svn diff.
          I can provide a patch with the complete diff, but you will loose the svn mv infos, so the svn history of the file will be lost.
          Any advise is welcomed. I will also ask on monday to my colleagues how they use to work with svn mv and patches.

          Show
          Nicolas Lalevée added a comment - Hum... same here.... This is due to some svn mv, and I created the patch with svn diff. I can provide a patch with the complete diff, but you will loose the svn mv infos, so the svn history of the file will be lost. Any advise is welcomed. I will also ask on monday to my colleagues how they use to work with svn mv and patches.
          Hide
          Grant Ingersoll added a comment -

          Hi Nicolas,

          I tried applying indexFormat.patch and am getting:
          grantingersoll@britta[1016]$ patch -p 0 -i ../patches/indexFormat.patch --dry-run
          patching file src/test/org/apache/lucene/store/IndexInputTest.java
          patching file src/test/org/apache/lucene/index/DocHelper.java
          patching file src/test/org/apache/lucene/index/TestIndexFormat.java
          can't find file to patch at input line 369
          Perhaps you used the wrong -p or --strip option?
          The text leading up to this was:
          --------------------------

          Property changes on: src/test/org/apache/lucene/index/TestIndexFormat.java
          ___________________________________________________________________
          Name: svn:keywords
          + Date Revision Author HeadURL Id
          Name: svn:eol-style
          + native
          Index: src/test/org/apache/lucene/index/impl/TestSegmentTermDocs.java
          ===================================================================
          — src/test/org/apache/lucene/index/impl/TestSegmentTermDocs.java (revision 0)
          +++ src/test/org/apache/lucene/index/impl/TestSegmentTermDocs.java (working copy)
          --------------------------
          File to patch:

          --------
          Meaning, it doesn't know what to do with this diff. From the looks of it, TestSegmentTermDocs.java did not get move to the impl directory from the directory it was in.

          I'm not sure how to handle this in SVN, but I suspect you have to do a local copy move first. Perhaps try applying this patch to a clean checkout to let me know if it works for you. Also, perhaps we can collaborate with Doron to write up some benchmarks or to at least make sure the existing benchmarks are covering this new way.

          Show
          Grant Ingersoll added a comment - Hi Nicolas, I tried applying indexFormat.patch and am getting: grantingersoll@britta [1016] $ patch -p 0 -i ../patches/indexFormat.patch --dry-run patching file src/test/org/apache/lucene/store/IndexInputTest.java patching file src/test/org/apache/lucene/index/DocHelper.java patching file src/test/org/apache/lucene/index/TestIndexFormat.java can't find file to patch at input line 369 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- Property changes on: src/test/org/apache/lucene/index/TestIndexFormat.java ___________________________________________________________________ Name: svn:keywords + Date Revision Author HeadURL Id Name: svn:eol-style + native Index: src/test/org/apache/lucene/index/impl/TestSegmentTermDocs.java =================================================================== — src/test/org/apache/lucene/index/impl/TestSegmentTermDocs.java (revision 0) +++ src/test/org/apache/lucene/index/impl/TestSegmentTermDocs.java (working copy) -------------------------- File to patch: -------- Meaning, it doesn't know what to do with this diff. From the looks of it, TestSegmentTermDocs.java did not get move to the impl directory from the directory it was in. I'm not sure how to handle this in SVN, but I suspect you have to do a local copy move first. Perhaps try applying this patch to a clean checkout to let me know if it works for you. Also, perhaps we can collaborate with Doron to write up some benchmarks or to at least make sure the existing benchmarks are covering this new way.
          Hide
          Nicolas Lalevée added a comment -

          Thanks Michael !
          I will appreciate a review and feedbacks, as it will open a lot the API, this will go even further than just make Document public (LUCENE-778).

          Show
          Nicolas Lalevée added a comment - Thanks Michael ! I will appreciate a review and feedbacks, as it will open a lot the API, this will go even further than just make Document public ( LUCENE-778 ).
          Hide
          Michael Busch added a comment -

          Nicolas,

          wow this looks like a lot of work!! There are good ideas in your patch. I have been and am currently very busy (moving to a new country...), so I probably won't have a chance to review it for another week or so.

          Michael

          Show
          Michael Busch added a comment - Nicolas, wow this looks like a lot of work!! There are good ideas in your patch. I have been and am currently very busy (moving to a new country...), so I probably won't have a chance to review it for another week or so. Michael
          Hide
          Nicolas Lalevée added a comment -

          Patch update: synchronized with the trunk and new features.

          • The index format has now an ID which is serialized in a new file in the directory. This new file is managed by the SegmentInfos class. It has been pushed in a new file to keep me from breaking things, but it may be pushed in the segment file. This new feature will help to avoid opening index with wrong code. Like the index version, if the index format is not compatible, opening it fails. And it also fails while trying to use IndexWriter#addIndexes(). This compatibilities issues are managed by the implementations of the index format: an implementation have to implement the function canRead(String indexFmtID). But I think something is still missing in this design. Saying that a format is compatible is another one is OK, but I have to figured out if this is really possible to make a reader which handle two different formats.
          • When synchronizing with the trunk, I had trouble with the new FieldSelectorResult : SIZE. This new feature expect the FieldsReader to know the size of the content of the field. With the generic FieldReader, the data is only a sequence of byte, so it cannot compute the size of the decoded data. I did a dumb implementation: it returns the size of the data in bytes. I know this is wrong, the associated tests fail (I let it fails in the patch). It has to be fixed, this may require some change in the API I have designed.
          • There was a discussion in java-dev about changing the order of the postings. Today in the .frq file, the document numbers are ordered by document number. The proposal was to order them by frequency. So I worked a little bit on the mechanism I have done to generify the field storing, and applied it to posting storing. This part of the patch proposed here is not well (nearly not at all) documented and is a draft. But it works (at least with the actual implementation), the mechanism allow to implement a custom PostingReader, PostingWritter :

          public interface PostingWriter

          { public void close() throws IOException; public long[] getPointers(); public int getNbPointer(); public long writeSkip(RAMOutputStream skipBuffer) throws IOException; public void write(int doc, int lastDoc, int nbPos, int[] positions) throws IOException; }

          public interface PostingReader

          { public void close() throws IOException ; public TermDocs termDocs(BitVector deletedDocs, TermInfosReader tis, FieldInfos fieldInfos) throws IOException; public TermPositions termPositions(BitVector deletedDocs, TermInfosReader tis, FieldInfos fieldInfos) throws IOException; }

          Furthermore this "generification" also allows an implementation invoked many times : http://wiki.apache.org/jakarta-lucene/FlexibleIndexing
          Note that it does not break the actual format. The .tis file is still managed internaly by Lucene and it holds pointers to some external files (managed by the indexFormat). The implementation of the PostingReader/PostingWriter specify how many pointers there are. The default one is 2 : .frq and .prx. The FlexibleIndexing would be 1.

          • To show that the default implementation of the index format can be changed, I have created a new package org.apache.lucene.index.impl which holds the actual index format :
          • DefaultFieldData : the data part of Field
          • DefaultFieldsReader : the non-generified part of the FieldsReader
          • DefaultFieldsWriter : the non-generified part of the FieldsWriter
          • DefaultIndexFormat : the factory of readers and writers
          • DefaultPostringReader : just instanciate SegmentTermDocs and SegmentTermPositions
          • DefaultPostringWriter : the posting writing part of DocumentWriter
          • SegmentTermDocs : just moved
          • SegmentTermPositions : just moved
          • Where I want to continue: I am mainly interested in the generic field storage, so I will continue to maintain it, I will try to fix the SIZE issue and will work about allowing readers being compatible with each other. I am also interested in some generic index storing for facetted search. But I figured out that the indexed data have to be stored at the document level. And this cannot be done with postings. So I don't think I will go further in playing with postings. I prefer look at LUCENE-584.
          Show
          Nicolas Lalevée added a comment - Patch update: synchronized with the trunk and new features. The index format has now an ID which is serialized in a new file in the directory. This new file is managed by the SegmentInfos class. It has been pushed in a new file to keep me from breaking things, but it may be pushed in the segment file. This new feature will help to avoid opening index with wrong code. Like the index version, if the index format is not compatible, opening it fails. And it also fails while trying to use IndexWriter#addIndexes(). This compatibilities issues are managed by the implementations of the index format: an implementation have to implement the function canRead(String indexFmtID). But I think something is still missing in this design. Saying that a format is compatible is another one is OK, but I have to figured out if this is really possible to make a reader which handle two different formats. When synchronizing with the trunk, I had trouble with the new FieldSelectorResult : SIZE. This new feature expect the FieldsReader to know the size of the content of the field. With the generic FieldReader, the data is only a sequence of byte, so it cannot compute the size of the decoded data. I did a dumb implementation: it returns the size of the data in bytes. I know this is wrong, the associated tests fail (I let it fails in the patch). It has to be fixed, this may require some change in the API I have designed. There was a discussion in java-dev about changing the order of the postings. Today in the .frq file, the document numbers are ordered by document number. The proposal was to order them by frequency. So I worked a little bit on the mechanism I have done to generify the field storing, and applied it to posting storing. This part of the patch proposed here is not well (nearly not at all) documented and is a draft. But it works (at least with the actual implementation), the mechanism allow to implement a custom PostingReader, PostingWritter : public interface PostingWriter { public void close() throws IOException; public long[] getPointers(); public int getNbPointer(); public long writeSkip(RAMOutputStream skipBuffer) throws IOException; public void write(int doc, int lastDoc, int nbPos, int[] positions) throws IOException; } public interface PostingReader { public void close() throws IOException ; public TermDocs termDocs(BitVector deletedDocs, TermInfosReader tis, FieldInfos fieldInfos) throws IOException; public TermPositions termPositions(BitVector deletedDocs, TermInfosReader tis, FieldInfos fieldInfos) throws IOException; } Furthermore this "generification" also allows an implementation invoked many times : http://wiki.apache.org/jakarta-lucene/FlexibleIndexing Note that it does not break the actual format. The .tis file is still managed internaly by Lucene and it holds pointers to some external files (managed by the indexFormat). The implementation of the PostingReader/PostingWriter specify how many pointers there are. The default one is 2 : .frq and .prx. The FlexibleIndexing would be 1. To show that the default implementation of the index format can be changed, I have created a new package org.apache.lucene.index.impl which holds the actual index format : DefaultFieldData : the data part of Field DefaultFieldsReader : the non-generified part of the FieldsReader DefaultFieldsWriter : the non-generified part of the FieldsWriter DefaultIndexFormat : the factory of readers and writers DefaultPostringReader : just instanciate SegmentTermDocs and SegmentTermPositions DefaultPostringWriter : the posting writing part of DocumentWriter SegmentTermDocs : just moved SegmentTermPositions : just moved Where I want to continue: I am mainly interested in the generic field storage, so I will continue to maintain it, I will try to fix the SIZE issue and will work about allowing readers being compatible with each other. I am also interested in some generic index storing for facetted search. But I figured out that the indexed data have to be stored at the document level. And this cannot be done with postings. So I don't think I will go further in playing with postings. I prefer look at LUCENE-584 .
          Hide
          Nicolas Lalevée added a comment -

          here it is : I have synchronized with the current trunk, and I have splited the patch in two parts.

          Show
          Nicolas Lalevée added a comment - here it is : I have synchronized with the current trunk, and I have splited the patch in two parts.
          Hide
          Nicolas Lalevée added a comment -

          Patch synchronized with the trunk.
          I also tried to minimize the diff. And in fact I just realized that there are two patchs in one there :

          • the real object-oriented storage of field data.
          • and some refactoring about the storage of the field infos : for reuse of the indexed binary storage of a table of String.

          I will try to seperate them.

          Show
          Nicolas Lalevée added a comment - Patch synchronized with the trunk. I also tried to minimize the diff. And in fact I just realized that there are two patchs in one there : the real object-oriented storage of field data. and some refactoring about the storage of the field infos : for reuse of the indexed binary storage of a table of String. I will try to seperate them.
          Hide
          Nicolas Lalevée added a comment -

          Not at all.

          In fact we don't use a lucene modified with my patch in our system. I only start working with lucene this year, and our search engine is a too critical component to play with a patched trunk. So I have even not tested it in real condition.

          Show
          Nicolas Lalevée added a comment - Not at all. In fact we don't use a lucene modified with my patch in our system. I only start working with lucene this year, and our search engine is a too critical component to play with a patched trunk. So I have even not tested it in real condition.
          Hide
          Grant Ingersoll added a comment -

          Hi Nicolas,

          Have you run any benchmarks on this? Once I finish up some documentation stuff, my plan is to start digging into this.

          -Grant

          Show
          Grant Ingersoll added a comment - Hi Nicolas, Have you run any benchmarks on this? Once I finish up some documentation stuff, my plan is to start digging into this. -Grant
          Hide
          Nicolas Lalevée added a comment -

          Here is an update of the patch:

          • merged with the last commit in trunk
          • I have fixed the issue with stream cloning (just reusing the same way of cloning as it is done in the current trunk)
          • the FieldData is back. So the Fieldable is back too. And the worry I had about offering an internal function to public is gone.
          • every test passed.
          • I have moved the bunch of classes that implement the FieldReader/FieldWriter in a RDF way into the tests. So there are some tests on this extension mechanism.
          Show
          Nicolas Lalevée added a comment - Here is an update of the patch: merged with the last commit in trunk I have fixed the issue with stream cloning (just reusing the same way of cloning as it is done in the current trunk) the FieldData is back. So the Fieldable is back too. And the worry I had about offering an internal function to public is gone. every test passed. I have moved the bunch of classes that implement the FieldReader/FieldWriter in a RDF way into the tests. So there are some tests on this extension mechanism.
          Hide
          Nicolas Lalevée added a comment -

          I just realized reading the recent discussing on lucene-dev (LazyField use of IndexInput not thread safe) that the implementation I have done isn't thread safe at all. The input is not cloned at all...

          Show
          Nicolas Lalevée added a comment - I just realized reading the recent discussing on lucene-dev (LazyField use of IndexInput not thread safe) that the implementation I have done isn't thread safe at all. The input is not cloned at all...
          Hide
          Nicolas Lalevée added a comment -

          It is due to lazy loading. A lazy field, when being retreived data, have to know how to read the stream. In the current trunk, a special implementation of Field is doing this. Here, we don't have control of the implemenation of Fieldable it will be. As I wanted to keep the lazy loading mechanism controlled internally in Lucene, being transparent to the user, I had to force every Fieldable implementation to know how about retreiving data lazily. So I switched the interface to an abstract class : in fact I have moved AbstractField to Fieldable.
          But as I already raised, I still have an issue about it : the lazy loading mechanism isn't totally internal. The function Fieldable.setLazyData() shouldn't be public but default.

          Show
          Nicolas Lalevée added a comment - It is due to lazy loading. A lazy field, when being retreived data, have to know how to read the stream. In the current trunk, a special implementation of Field is doing this. Here, we don't have control of the implemenation of Fieldable it will be. As I wanted to keep the lazy loading mechanism controlled internally in Lucene, being transparent to the user, I had to force every Fieldable implementation to know how about retreiving data lazily. So I switched the interface to an abstract class : in fact I have moved AbstractField to Fieldable. But as I already raised, I still have an issue about it : the lazy loading mechanism isn't totally internal. The function Fieldable.setLazyData() shouldn't be public but default.
          Hide
          Grant Ingersoll added a comment -

          Haven't looked fully at the patch, but one thing I am curious about is why remove the Fieldable interface?

          Show
          Grant Ingersoll added a comment - Haven't looked fully at the patch, but one thing I am curious about is why remove the Fieldable interface?
          Hide
          Nicolas Lalevée added a comment -

          I think I got it. What was disturbing on the last patch was the notion of FieldData I added. So I removed it. So let's summerize the diff between the trunk and my patch :

          • The concepts :
            • an IndexFormat defines which FieldsWriter and FieldsReader to use
            • an IndexFormat defines the used extensions, so the user can add it's own files
            • the format of an index is attached to the Directory
            • the whole index format isn't customizable, just a part of them. So some functions are private or "default", so the Lucene user won't have acess to them : it's Lucene internal stuff. Some others are public or protected : they can be redefined.
            • Lucene now provide an API to add some files which are tables of data, as the FieldInfos is
            • it is to the FieldsWriter implementation to check if the field to write is of the same format (basically checking by a instanceof).
            • the user can add some information at the document level, and provide it's own implementation of Document
            • the user can define how data for a field is stored and retreived, and provide it's own implementation of Fieldable
            • the reading of field data is done in the Fieldable
            • the writting of the field is done in the FieldsWriter
          • API change :
            • There are new constructors of the directory : contructors with specified IndexFormat
            • new Entry and EntryTable : generic API for managing a table of data in a file
            • FieldInfos extends now EntryTable
          • Code changes :
            • AbstractField become Fieldable (Fieldable is no more an interface).
            • the FieldsWriter have been separated in the abstract class FieldsWriter and its default implementation DefaultFieldsWriter. Idem for FieldsReader and DefaultFieldsReader.
            • the lazy loading have been moved from FieldsReader to Fieldable
            • IndexOuput can now write directly from an input stream
            • If a field was loaded lazily, the DefaultFieldsWriter directly copy the source input stream to the output stream
            • the IndexFileNameFilter take now it's list of known file extensions from the index format
            • each time a temporary RAM directory is created, the index format have to be passed : see diff for CompoundFileReader or IndexWriter
            • Some private and/or final have been moved to public
          • Last worries :
            • quite a big one in fact, but I don't know how to handle it : every RMI tests fails because of :
              error unmarshalling return; nested exception is:
                  [junit]     java.io.InvalidClassException: org.apache.lucene.document.Field; no valid constructor
                  [junit] java.rmi.UnmarshalException: error unmarshalling return; nested exception is:
                  [junit]     java.io.InvalidClassException: org.apache.lucene.document.Field; no valid constructor
                  [junit]     at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:157)
              
            • a function is public and it shouldn't : see Fieldable.setLazyData()

          I have added an exemple of implementation in the patch that use this future : look at org.apache.lucene.index.rdf

          I know this is a big patch but I think the API has not been broken, and I would appreciate comments on this.

          Show
          Nicolas Lalevée added a comment - I think I got it. What was disturbing on the last patch was the notion of FieldData I added. So I removed it. So let's summerize the diff between the trunk and my patch : The concepts : an IndexFormat defines which FieldsWriter and FieldsReader to use an IndexFormat defines the used extensions, so the user can add it's own files the format of an index is attached to the Directory the whole index format isn't customizable, just a part of them. So some functions are private or "default", so the Lucene user won't have acess to them : it's Lucene internal stuff. Some others are public or protected : they can be redefined. Lucene now provide an API to add some files which are tables of data, as the FieldInfos is it is to the FieldsWriter implementation to check if the field to write is of the same format (basically checking by a instanceof). the user can add some information at the document level, and provide it's own implementation of Document the user can define how data for a field is stored and retreived, and provide it's own implementation of Fieldable the reading of field data is done in the Fieldable the writting of the field is done in the FieldsWriter API change : There are new constructors of the directory : contructors with specified IndexFormat new Entry and EntryTable : generic API for managing a table of data in a file FieldInfos extends now EntryTable Code changes : AbstractField become Fieldable (Fieldable is no more an interface). the FieldsWriter have been separated in the abstract class FieldsWriter and its default implementation DefaultFieldsWriter. Idem for FieldsReader and DefaultFieldsReader. the lazy loading have been moved from FieldsReader to Fieldable IndexOuput can now write directly from an input stream If a field was loaded lazily, the DefaultFieldsWriter directly copy the source input stream to the output stream the IndexFileNameFilter take now it's list of known file extensions from the index format each time a temporary RAM directory is created, the index format have to be passed : see diff for CompoundFileReader or IndexWriter Some private and/or final have been moved to public Last worries : quite a big one in fact, but I don't know how to handle it : every RMI tests fails because of : error unmarshalling return; nested exception is: [junit] java.io.InvalidClassException: org.apache.lucene.document.Field; no valid constructor [junit] java.rmi.UnmarshalException: error unmarshalling return; nested exception is: [junit] java.io.InvalidClassException: org.apache.lucene.document.Field; no valid constructor [junit] at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:157) a function is public and it shouldn't : see Fieldable.setLazyData() I have added an exemple of implementation in the patch that use this future : look at org.apache.lucene.index.rdf I know this is a big patch but I think the API has not been broken, and I would appreciate comments on this.

            People

            • Assignee:
              Unassigned
              Reporter:
              Nicolas Lalevée
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development