Lucene - Core
  1. Lucene - Core
  2. LUCENE-3733 Remaining TODOs of LUCENE-2858: Finalize AtomicReader/CompositeReader API
  3. LUCENE-3736

ParallelReader is now atomic, rename to ParallelAtomicReader and also add a ParallelCompositeReader (that requires LogDocMergePolicy to have identical subreader structure)

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The plan is:

      • Move all subreaders to ctor (builder-like API. First build reader-set, then call build)
      • Rename ParallelReader to ParallelAtomicReader
      • Add a ParallelCompositeReader with same builder API, but taking any CompositeReader-set and checks them that they are aligned (docStarts identical). The subreaders are ParallelAtomicReaders.
      1. LUCENE-3736.patch
        70 kB
        Uwe Schindler
      2. LUCENE-3736.patch
        69 kB
        Uwe Schindler
      3. LUCENE-3736.patch
        69 kB
        Uwe Schindler
      4. LUCENE-3736.patch
        63 kB
        Michael McCandless
      5. LUCENE-3736.patch
        14 kB
        Michael McCandless
      6. LUCENE-3736.patch
        64 kB
        Uwe Schindler
      7. LUCENE-3736.patch
        63 kB
        Michael McCandless
      8. LUCENE-3736.patch
        65 kB
        Uwe Schindler
      9. LUCENE-3736.patch
        63 kB
        Uwe Schindler
      10. LUCENE-3736.patch
        54 kB
        Uwe Schindler
      11. LUCENE-3736.patch
        6 kB
        Uwe Schindler
      12. LUCENE-3736-improveTestCoverage.patch
        8 kB
        Uwe Schindler
      13. LUCENE-3736-readerMaps.patch
        13 kB
        Uwe Schindler
      14. LUCENE-3736-readerMaps.patch
        2 kB
        Uwe Schindler

        Activity

        Hide
        Uwe Schindler added a comment - - edited

        Here just my cleanup work in ParallelReader, nothing new. It's as before, only "bugs" (missing open checks) fixed and code violations (synthetic accessors, final fields).

        The next step will be to remove the add() methods, as IndexReaders should not be changed after create.

        Will work more tomorrow.

        The plan is:

        • Move all subreaders to ctor (builder-like API. First build reader-set, then call build)
        • Rename ParallelReader to ParallelAtomicReader
        • Add a ParallelCompositeReader with same builder API, but taking any CompositeReader-set and checks them that they are aligned (docStarts identical). The subreaders are ParallelAtomicReaders.
        Show
        Uwe Schindler added a comment - - edited Here just my cleanup work in ParallelReader, nothing new. It's as before, only "bugs" (missing open checks) fixed and code violations (synthetic accessors, final fields). The next step will be to remove the add() methods, as IndexReaders should not be changed after create. Will work more tomorrow. The plan is: Move all subreaders to ctor (builder-like API. First build reader-set, then call build) Rename ParallelReader to ParallelAtomicReader Add a ParallelCompositeReader with same builder API, but taking any CompositeReader-set and checks them that they are aligned (docStarts identical). The subreaders are ParallelAtomicReaders.
        Hide
        Uwe Schindler added a comment -

        Attached is a patch implementing the above proposal using the builder pattern. The builder pattern (sorry Robert), is the only nice setup that allows to set properties like ignroing stored fields on the parallel readers, but make the built reader unmodifiable!

        Show
        Uwe Schindler added a comment - Attached is a patch implementing the above proposal using the builder pattern. The builder pattern (sorry Robert), is the only nice setup that allows to set properties like ignroing stored fields on the parallel readers, but make the built reader unmodifiable!
        Hide
        Uwe Schindler added a comment -

        There are som test todos: The tests for parallel readers are very simplistic and have only 2 documents (which is especially stupid for composite readers to test them). We should raise number of documents.

        Show
        Uwe Schindler added a comment - There are som test todos: The tests for parallel readers are very simplistic and have only 2 documents (which is especially stupid for composite readers to test them). We should raise number of documents.
        Hide
        Uwe Schindler added a comment -

        New patch with javadocs and imporved tests (to check all builder setting).

        I will commit this later!

        Show
        Uwe Schindler added a comment - New patch with javadocs and imporved tests (to check all builder setting). I will commit this later!
        Hide
        Robert Muir added a comment -

        The builder pattern (sorry Robert), is the only nice setup that allows to set properties like ignroing stored fields on the parallel readers, but make the built reader unmodifiable!

        I think its probably appropriate here, I think immutability for an indexreader subclass is worth the pain

        But maybe we don't need it to return itself on add()? I don't think building parallelreaders is like building
        Strings, I think it can just return void for add()... (I can't think().of().a().situation().where().this() would help code readability)

        Show
        Robert Muir added a comment - The builder pattern (sorry Robert), is the only nice setup that allows to set properties like ignroing stored fields on the parallel readers, but make the built reader unmodifiable! I think its probably appropriate here, I think immutability for an indexreader subclass is worth the pain But maybe we don't need it to return itself on add()? I don't think building parallelreaders is like building Strings, I think it can just return void for add()... (I can't think().of().a().situation().where().this() would help code readability)
        Hide
        Michael McCandless added a comment -

        But maybe we don't need it to return itself on add()?

        +1, I would prefer that add() return void.

        Otherwise the patch looks great!

        Show
        Michael McCandless added a comment - But maybe we don't need it to return itself on add()? +1, I would prefer that add() return void. Otherwise the patch looks great!
        Hide
        Uwe Schindler added a comment -

        If you look at the test cases, the code is much less verbose and better readable.

        The good thing of the builder pattern (remember the class is actually called "Builder") is that nobody is required to use it). But I prefer to chain calls so I want to have the opportunity to do that.

        Returning void of itsself is no difference in bytecode or performance, it just adds a possibility. And everybody expects that when he sees a class named "Builder" with a method build().

        I().dont().want().to().start().fights().here().again(), but this time I will not change the patch that forces me to use another pattern i dont like. Code without chaining here looks horrible.

        Just
        rewrite
        the
        test
        ,
        you
        have
        to
        declare
        an
        additional
        variable
        with
        a
        very
        verbose
        name
        .

        If somebody wants to change this, he can do this in another issue called "remove all builders from Lucene", but then please also rename the methods away from build() and rename the classes.

        Show
        Uwe Schindler added a comment - If you look at the test cases, the code is much less verbose and better readable. The good thing of the builder pattern (remember the class is actually called "Builder") is that nobody is required to use it). But I prefer to chain calls so I want to have the opportunity to do that. Returning void of itsself is no difference in bytecode or performance, it just adds a possibility. And everybody expects that when he sees a class named "Builder" with a method build(). I().dont().want().to().start().fights().here().again(), but this time I will not change the patch that forces me to use another pattern i dont like. Code without chaining here looks horrible. Just rewrite the test , you have to declare an additional variable with a very verbose name . If somebody wants to change this, he can do this in another issue called "remove all builders from Lucene", but then please also rename the methods away from build() and rename the classes.
        Hide
        Robert Muir added a comment -

        Uwe, its not a big deal to me really... just a suggestion.

        so +1 to commit, thanks for cleaning up ParallelReader

        Show
        Robert Muir added a comment - Uwe, its not a big deal to me really... just a suggestion. so +1 to commit, thanks for cleaning up ParallelReader
        Hide
        Uwe Schindler added a comment -

        Committed trunk revision: 1241470

        CHANGES and MIGRATE will be added in parent issue.

        Show
        Uwe Schindler added a comment - Committed trunk revision: 1241470 CHANGES and MIGRATE will be added in parent issue.
        Hide
        Michael McCandless added a comment -

        If you look at the test cases, the code is much less verbose and better readable.

        Well, you separated each method call onto a separate line, so, it'd
        look basically the same without chaining? If only all chained API
        consumers followed your approach...

        To be clear: what I dislike about chaining is it creates ambiguity
        in how you write the code that consumes our APIs.
        You().can().do().this(). Or, you().
        can().
        do().
        this(). Or maybe you().can().
        do().this().

        Ambiguity is very bad, especially in open-source dev: it invites
        bikeshed wars, harming communities by dividing them, spending precious
        time debating what from the outside would seem like trivial
        differences.

        If somebody wants to change this, he can do this in another issue called "remove all builders from Lucene"

        The first API I would fix is IndexWriterConfig; its setters should not
        be chainable, because now we get code like this:

        IndexWriter writer = new IndexWriter(dir, newIndexWriterConfig( TEST_VERSION_CURRENT, new MockAnalyzer(random)).setMaxBufferedDocs(2).setMergePolicy(newLogMergePolicy()));
        

        Code is already hard enough to read... we should not make it even
        harder by enabling big hard-to-read compound expressions like this.

        This phenomenon is [unfortunately] human nature, and also well outside
        of software development; see
        http://en.wikipedia.org/wiki/Parkinson's_Law_of_Triviality ... the
        less important the ambiguity the more brutal the bike shed wars will
        be. You see this also in Theodor Geisel ("Dr Seuss")'s delightful
        Sneetches (the stars on their stomachs), his Butter Battle Book (which
        side to butter the bread on), which end of the egg to crack (the
        Lilliputians in Gulliver's Travels), etc. It's not an uncommon
        problem

        Battles over code styling is a great example of this phenomenon, and
        fortunately we long ago adopted a standard for Lucene so we don't
        argue (much!) about code style. There is one way and there is no
        (little!) ambiguity left.

        So I don't want to add any more chainable methods in Lucene. It
        creates an unnecessary ambiguity and I don't like where that will lead
        us. We should reduce ambiguity whenever we can: there should
        generally be one obvious way to do something.

        Show
        Michael McCandless added a comment - If you look at the test cases, the code is much less verbose and better readable. Well, you separated each method call onto a separate line, so, it'd look basically the same without chaining? If only all chained API consumers followed your approach... To be clear: what I dislike about chaining is it creates ambiguity in how you write the code that consumes our APIs. You().can().do().this(). Or, you(). can(). do(). this(). Or maybe you().can(). do().this(). Ambiguity is very bad, especially in open-source dev: it invites bikeshed wars, harming communities by dividing them, spending precious time debating what from the outside would seem like trivial differences. If somebody wants to change this, he can do this in another issue called "remove all builders from Lucene" The first API I would fix is IndexWriterConfig; its setters should not be chainable, because now we get code like this: IndexWriter writer = new IndexWriter(dir, newIndexWriterConfig( TEST_VERSION_CURRENT, new MockAnalyzer(random)).setMaxBufferedDocs(2).setMergePolicy(newLogMergePolicy())); Code is already hard enough to read... we should not make it even harder by enabling big hard-to-read compound expressions like this. This phenomenon is [unfortunately] human nature, and also well outside of software development; see http://en.wikipedia.org/wiki/Parkinson's_Law_of_Triviality ... the less important the ambiguity the more brutal the bike shed wars will be. You see this also in Theodor Geisel ("Dr Seuss")'s delightful Sneetches (the stars on their stomachs), his Butter Battle Book (which side to butter the bread on), which end of the egg to crack (the Lilliputians in Gulliver's Travels), etc. It's not an uncommon problem Battles over code styling is a great example of this phenomenon, and fortunately we long ago adopted a standard for Lucene so we don't argue (much!) about code style. There is one way and there is no (little!) ambiguity left. So I don't want to add any more chainable methods in Lucene. It creates an unnecessary ambiguity and I don't like where that will lead us. We should reduce ambiguity whenever we can: there should generally be one obvious way to do something.
        Hide
        Michael McCandless added a comment -

        Another question: do we even need a ParallelCompositeReader? Can't we
        have only the Builder, and that returns a MultiReader?

        Show
        Michael McCandless added a comment - Another question: do we even need a ParallelCompositeReader? Can't we have only the Builder, and that returns a MultiReader?
        Hide
        Michael McCandless added a comment -

        Reopening to address the chained APIs...

        Show
        Michael McCandless added a comment - Reopening to address the chained APIs...
        Hide
        Uwe Schindler added a comment -

        Another question: do we even need a ParallelCompositeReader? Can't we

        have only the Builder, and that returns a MultiReader?

        The problem why we need ParallelCompositeReader is that the refCounting and close logic is different. MultiReader will close/refcount its childs, but the ParallelCompositeReader must close/refcount the parallel sub readers. If you return a simple MultiReader, all tests fail, because something is closing the subreaders of the original parallel Composite/DirectoryReaders (as they are closed by the wrappers), but the original CompositeReaders stay open.

        If the parallel readers are DirectoryReaders it goes very bad, as the wrapped SegmentReaders are closed...

        The change does the following: It returns a BaseMultiReader of ParallelAtomicReaders that wrap e.g. the SegmentReaders of a DirectoryReader. On close it decRefs or closes the original wrapped DirectoryReader, but does not touch the subreaders. In contrast, a normal MultiReader would close/decRef the inner ParallelAtomicReaders, which itsself cose the SegmentReaders they wrap. The DirectoryReaders would still be open.

        Show
        Uwe Schindler added a comment - Another question: do we even need a ParallelCompositeReader? Can't we have only the Builder, and that returns a MultiReader? The problem why we need ParallelCompositeReader is that the refCounting and close logic is different. MultiReader will close/refcount its childs, but the ParallelCompositeReader must close/refcount the parallel sub readers. If you return a simple MultiReader, all tests fail, because something is closing the subreaders of the original parallel Composite/DirectoryReaders (as they are closed by the wrappers), but the original CompositeReaders stay open. If the parallel readers are DirectoryReaders it goes very bad, as the wrapped SegmentReaders are closed... The change does the following: It returns a BaseMultiReader of ParallelAtomicReaders that wrap e.g. the SegmentReaders of a DirectoryReader. On close it decRefs or closes the original wrapped DirectoryReader, but does not touch the subreaders. In contrast, a normal MultiReader would close/decRef the inner ParallelAtomicReaders, which itsself cose the SegmentReaders they wrap. The DirectoryReaders would still be open.
        Hide
        Michael McCandless added a comment -

        Patch, removing the chained APIs; I'd like to commit this soon, if there are no objections. Otherwise I think we should revert the first commit and continue iterating on this issue until we reach agreement.

        Show
        Michael McCandless added a comment - Patch, removing the chained APIs; I'd like to commit this soon, if there are no objections. Otherwise I think we should revert the first commit and continue iterating on this issue until we reach agreement.
        Hide
        Uwe Schindler added a comment -

        Please keep the chained APIs, otherwise please also change all toString() methods in whole Lucene to no longer chain and rename all classes to no longer be called Builder with build() as method. If you keep the class names and disallow chaing you break the pattern and that will confuse people more.

        For me the issue is closed as I disagree, somebody else can do this

        Show
        Uwe Schindler added a comment - Please keep the chained APIs, otherwise please also change all toString() methods in whole Lucene to no longer chain and rename all classes to no longer be called Builder with build() as method. If you keep the class names and disallow chaing you break the pattern and that will confuse people more. For me the issue is closed as I disagree, somebody else can do this
        Hide
        Uwe Schindler added a comment -

        Patch, removing the chained APIs

        I disagree!

        Show
        Uwe Schindler added a comment - Patch, removing the chained APIs I disagree!
        Hide
        Robert Muir added a comment -

        As i stated earlier on the issue, I agree technically with the builder here, for these reasons:

        • its not like Document/Field where we worry about making tons of objects in indexing
        • as far as immutable objects for IndexReader subclasses, thats a no-brainer from my perspective.
        • we need lots of checks that the 'structure' is the same to support per-segment search.

        So the only remaining issue is the style... yes I don't personally like chaining, but I gave my
        +1 to the patch, because I thought Uwe's argument was reasonable (its true its not enforced),
        and because I don't think its worth arguing over for such an expert API (there are bigger wins
        when it comes to cleaning up our APIs).

        Personally I am glad to Uwe for all this work, we wanted to fix these issues in ParallelReader
        for a long time (LUCENE-2766). It needed a policeman to do this, for all the checks to be correct,
        and good javadocs explaining how to make the composite case work.

        So maybe we should just open a separate issue for the style? I just feel we would lose so much
        if we went back to what we had before this change, I'd really prefer it not be backed out for
        that reason.

        Show
        Robert Muir added a comment - As i stated earlier on the issue, I agree technically with the builder here, for these reasons: its not like Document/Field where we worry about making tons of objects in indexing as far as immutable objects for IndexReader subclasses, thats a no-brainer from my perspective. we need lots of checks that the 'structure' is the same to support per-segment search. So the only remaining issue is the style... yes I don't personally like chaining, but I gave my +1 to the patch, because I thought Uwe's argument was reasonable (its true its not enforced), and because I don't think its worth arguing over for such an expert API (there are bigger wins when it comes to cleaning up our APIs). Personally I am glad to Uwe for all this work, we wanted to fix these issues in ParallelReader for a long time ( LUCENE-2766 ). It needed a policeman to do this, for all the checks to be correct, and good javadocs explaining how to make the composite case work. So maybe we should just open a separate issue for the style? I just feel we would lose so much if we went back to what we had before this change, I'd really prefer it not be backed out for that reason.
        Hide
        Michael McCandless added a comment -

        I think the changes here are awesome too – ParallelReader has been
        badly unloved for a long time, so I'm very happy we are improving it
        here.

        But I don't think we need to have chained APIs to achieve that good
        progress. Chained APIS are dangerous. We added chained APIs in
        IndexWriterConfig and I think that was a mistake... it gave us lots of
        not-so-readable code. We shouldn't encourage that and we shouldn't
        add more.

        Show
        Michael McCandless added a comment - I think the changes here are awesome too – ParallelReader has been badly unloved for a long time, so I'm very happy we are improving it here. But I don't think we need to have chained APIs to achieve that good progress. Chained APIS are dangerous. We added chained APIs in IndexWriterConfig and I think that was a mistake... it gave us lots of not-so-readable code. We shouldn't encourage that and we shouldn't add more.
        Hide
        Michael McCandless added a comment -

        OK I opened LUCENE-3756 to address the chained APIs in IndexWriterConfig.

        Show
        Michael McCandless added a comment - OK I opened LUCENE-3756 to address the chained APIs in IndexWriterConfig.
        Hide
        Uwe Schindler added a comment -

        If somebody wants to commit the changes, do it. I want to be out of that, as I don't want to be responsible when somebody opens an issue that says "Builder pattern without chaining is broken" (which it is). Do what you want but keep me away from it!

        Just a suggestion:

        • As noted before, a Builder class with a build() method without chaining violates the pattern, so please rename it!

        I unassigned from this issue and will take care of some minor AtomicReaderContext changes (to remove one more utility method from ReaderUtil thats useless, if the API would implement what Javadocs say - I am talking about leaves()).

        Show
        Uwe Schindler added a comment - If somebody wants to commit the changes, do it. I want to be out of that, as I don't want to be responsible when somebody opens an issue that says "Builder pattern without chaining is broken" (which it is). Do what you want but keep me away from it! Just a suggestion: As noted before, a Builder class with a build() method without chaining violates the pattern, so please rename it! I unassigned from this issue and will take care of some minor AtomicReaderContext changes (to remove one more utility method from ReaderUtil thats useless, if the API would implement what Javadocs say - I am talking about leaves()).
        Hide
        Uwe Schindler added a comment -

        As a comment came up on IRC:

        [17:21] mikemccand: well... we could also remove the builder
        [17:22] mikemccand: PR is expert

        IndexReaders are now unmodifiable and I was able to make numDocs, maxDocs,... all final, so reverting the removal of ParallelReader.add() is not an option. I think Robert and I agree here, the old code was too risky - we should fix all of Lucene 4's API to remove not immutable classes where possible.

        We can also move Parallel*Reader to contrib, its very special and seldom used The test in facet module is not really needed (I was about to remove it already).

        Show
        Uwe Schindler added a comment - As a comment came up on IRC: [17:21] mikemccand: well... we could also remove the builder [17:22] mikemccand: PR is expert IndexReaders are now unmodifiable and I was able to make numDocs, maxDocs,... all final, so reverting the removal of ParallelReader.add() is not an option. I think Robert and I agree here, the old code was too risky - we should fix all of Lucene 4's API to remove not immutable classes where possible. We can also move Parallel*Reader to contrib, its very special and seldom used The test in facet module is not really needed (I was about to remove it already).
        Hide
        Uwe Schindler added a comment -

        Mike: Here the updated patch for Lucene trunk, as reverse merging of the revert is impossible

        It was not a good idea to revert before Steven's change...

        Show
        Uwe Schindler added a comment - Mike: Here the updated patch for Lucene trunk, as reverse merging of the revert is impossible It was not a good idea to revert before Steven's change...
        Hide
        Michael McCandless added a comment -

        Thanks Uwe, I'll start from there and iterate...

        Show
        Michael McCandless added a comment - Thanks Uwe, I'll start from there and iterate...
        Hide
        Michael McCandless added a comment -

        OK, I started from Uwe's last patch (thanks!) and then replaced the Builder
        construction API with straight (normal) constructors.

        This way each PR can be created just like MultiReader for the common
        case:

          new ParallelAtomicReader(ir1, ir2)
        

        For the expert case you pass in separate arrays for the "normal"
        readers and the "stored fields" readers (and boolean for the
        "closeSubReaders"). That means the app must create the arrays, but I
        think that's fine (Paralel*Reader is already expert, and that ctor is
        extra-expert).

        Everything else is the same: we now have ParallelAtomicReader and
        ParallelCompositReader, it does all the checking to make sure the passed
        in atomic/composite readers are "congruent", etc..

        Show
        Michael McCandless added a comment - OK, I started from Uwe's last patch (thanks!) and then replaced the Builder construction API with straight (normal) constructors. This way each PR can be created just like MultiReader for the common case: new ParallelAtomicReader(ir1, ir2) For the expert case you pass in separate arrays for the "normal" readers and the "stored fields" readers (and boolean for the "closeSubReaders"). That means the app must create the arrays, but I think that's fine (Paralel*Reader is already expert, and that ctor is extra-expert). Everything else is the same: we now have ParallelAtomicReader and ParallelCompositReader, it does all the checking to make sure the passed in atomic/composite readers are "congruent", etc..
        Hide
        Uwe Schindler added a comment - - edited

        Hi Mike,

        I attached a new patch with some obsolete code removed: At the times of the builder, to detect errors early, the subreader checks were also not only inspecting the composite subs but also the leaves. As we now have no separation anymore between Builder.add() and the ctor that does the actual work, we can remove the leaves checks, as the recursive ctor will do the same checks, so its impossible to have different leaf structure.
        One check could be added instead: currently only maxDocs of subs are compared, maybe also numDocs.... But also here the check is done by the invoked ctors for the wrapped subreaders already.

        Otherwise I am not happy with the telescopic ctor (sorry, builder looked better for the expert case - this is now unreadable).

        The good thing is that we could add freedom so storedFieldsReaders can be completely separate. It would be easy to implement: closeSubReaders/parallelReaders arrays would need to be an union of both sets (currently only readers ctor param).

        I did not know that Collections.newSetFromMap() is already in 1.6. We should remove the util class MapBackedSet in trunk and replace all occurences by the same code like you did. I opened LUCENE-3764 for that.

        One small thing for "safety": MultiReader currently clones the reader arrays to prevent external modification. Both ParallelReaders should do the same. The builder enforced that before as you had no access to the subs and the array was cloned on building (copy from ArrayList->array).

        Show
        Uwe Schindler added a comment - - edited Hi Mike, I attached a new patch with some obsolete code removed: At the times of the builder, to detect errors early, the subreader checks were also not only inspecting the composite subs but also the leaves. As we now have no separation anymore between Builder.add() and the ctor that does the actual work, we can remove the leaves checks, as the recursive ctor will do the same checks, so its impossible to have different leaf structure. One check could be added instead: currently only maxDocs of subs are compared, maybe also numDocs.... But also here the check is done by the invoked ctors for the wrapped subreaders already. Otherwise I am not happy with the telescopic ctor (sorry, builder looked better for the expert case - this is now unreadable). The good thing is that we could add freedom so storedFieldsReaders can be completely separate. It would be easy to implement: closeSubReaders/parallelReaders arrays would need to be an union of both sets (currently only readers ctor param). I did not know that Collections.newSetFromMap() is already in 1.6. We should remove the util class MapBackedSet in trunk and replace all occurences by the same code like you did. I opened LUCENE-3764 for that. One small thing for "safety": MultiReader currently clones the reader arrays to prevent external modification. Both ParallelReaders should do the same. The builder enforced that before as you had no access to the subs and the array was cloned on building (copy from ArrayList->array).
        Hide
        Michael McCandless added a comment -

        I attached a new patch with some obsolete code removed

        Thanks Uwe!

        One check could be added instead: currently only maxDocs of subs are compared, maybe also numDocs.... But also here the check is done by the invoked ctors for the wrapped subreaders already.

        OK sounds like we can rely on the invoked ctors to catch this.

        Otherwise I am not happy with the telescopic ctor (sorry, builder looked better for the expert case - this is now unreadable).

        Well, but the common case is simpler? And, no longer an API break,
        besides the name change. Ie, you create the PAR/PCR like you do on
        3.x. I think that's the right tradeoff (expert case can be more
        work...).

        I also don't like the API inconsistency this would start (we don't use
        builders to create other IndexReaders).

        The good thing is that we could add freedom so storedFieldsReaders can be completely separate. It would be easy to implement: closeSubReaders/parallelReaders arrays would need to be an union of both sets (currently only readers ctor param).

        True, actually this would be pretty simple now I think?

        One small thing for "safety": MultiReader currently clones the reader arrays to prevent external modification. Both ParallelReaders should do the same. The builder enforced that before as you had no access to the subs and the array was cloned on building (copy from ArrayList->array).

        Ahh, right.

        I started from your patch, added the array clones, and added the
        missing javadocs... I didn't yet add allowing arbitrary
        storedFieldsReaders but I think this wouldn't be so hard...

        Show
        Michael McCandless added a comment - I attached a new patch with some obsolete code removed Thanks Uwe! One check could be added instead: currently only maxDocs of subs are compared, maybe also numDocs.... But also here the check is done by the invoked ctors for the wrapped subreaders already. OK sounds like we can rely on the invoked ctors to catch this. Otherwise I am not happy with the telescopic ctor (sorry, builder looked better for the expert case - this is now unreadable). Well, but the common case is simpler? And, no longer an API break, besides the name change. Ie, you create the PAR/PCR like you do on 3.x. I think that's the right tradeoff (expert case can be more work...). I also don't like the API inconsistency this would start (we don't use builders to create other IndexReaders). The good thing is that we could add freedom so storedFieldsReaders can be completely separate. It would be easy to implement: closeSubReaders/parallelReaders arrays would need to be an union of both sets (currently only readers ctor param). True, actually this would be pretty simple now I think? One small thing for "safety": MultiReader currently clones the reader arrays to prevent external modification. Both ParallelReaders should do the same. The builder enforced that before as you had no access to the subs and the array was cloned on building (copy from ArrayList->array). Ahh, right. I started from your patch, added the array clones, and added the missing javadocs... I didn't yet add allowing arbitrary storedFieldsReaders but I think this wouldn't be so hard...
        Hide
        Uwe Schindler added a comment -

        Patch is incomplete? Maybe you forgot to svn add

        Show
        Uwe Schindler added a comment - Patch is incomplete? Maybe you forgot to svn add
        Hide
        Michael McCandless added a comment -

        Patch is incomplete? Maybe you forgot to svn add

        Ugh, indeed, sorry. Fixed!

        Show
        Michael McCandless added a comment - Patch is incomplete? Maybe you forgot to svn add Ugh, indeed, sorry. Fixed!
        Hide
        Uwe Schindler added a comment -

        Thanks, looks fine. I will look tomorrow into a IdentityHashSet on all readers (storedReaders+readers) and use that for incRef/close. Thats the easiest.

        One small improvement for shorter code:

        You can add all entries from an array to a collection with:

        final Set<CompositeReader> readersSet = Collections.newSetFromMap(new IdentityHashMap<CompositeReader,Boolean>());
        Collections.addAll(readersSet, readers);
        

        (it's just shorter)

        Show
        Uwe Schindler added a comment - Thanks, looks fine. I will look tomorrow into a IdentityHashSet on all readers (storedReaders+readers) and use that for incRef/close. Thats the easiest. One small improvement for shorter code: You can add all entries from an array to a collection with: final Set<CompositeReader> readersSet = Collections.newSetFromMap( new IdentityHashMap<CompositeReader, Boolean >()); Collections.addAll(readersSet, readers); (it's just shorter)
        Hide
        Uwe Schindler added a comment -

        Hi Mike,

        I found a bug in your implementation of the ParallelCompositeReader ctor, the test did not hit this as stupiditly the number of vertical segments was 2, but also the number of parallel subreaders! You simply got the validation (was before in the builder at the wrong place) iterate over the wrong set (vertical vs. parallel). I did it like for the atomic ones as a validate method.

        Show
        Uwe Schindler added a comment - Hi Mike, I found a bug in your implementation of the ParallelCompositeReader ctor, the test did not hit this as stupiditly the number of vertical segments was 2, but also the number of parallel subreaders! You simply got the validation (was before in the builder at the wrong place) iterate over the wrong set (vertical vs. parallel). I did it like for the atomic ones as a validate method.
        Hide
        Uwe Schindler added a comment - - edited

        Patch that fixes the bugs in Mike's code and also allows decoupled stored fields readers:

        • moved the composite reader checks from the main builder loop (where it was wrongly placed) to a validate method acting only on the top-level readers
        • I improved the tests to have a different number of documents, subreaders and parallel readers
        • Current limitation is only that at least one "searchable" reader must be there, but there can be 0..infinite stored readers
        • toString() shows the unique set of parallel readers, unfortunately unsorted (as hashed set)
        • I added one more ctor PR(closeSubReaders, IR subs...), this makes code not separating stored fields readers look better, as closeSubReaders is not an expert option.

        Mike, can I take the issue again and commit this? Thanks for the refactoring, but as we now allow separate stored fields and main readers, the missing builder is fine to me - grrrr

        Show
        Uwe Schindler added a comment - - edited Patch that fixes the bugs in Mike's code and also allows decoupled stored fields readers: moved the composite reader checks from the main builder loop (where it was wrongly placed) to a validate method acting only on the top-level readers I improved the tests to have a different number of documents, subreaders and parallel readers Current limitation is only that at least one "searchable" reader must be there, but there can be 0..infinite stored readers toString() shows the unique set of parallel readers, unfortunately unsorted (as hashed set) I added one more ctor PR(closeSubReaders, IR subs...), this makes code not separating stored fields readers look better, as closeSubReaders is not an expert option. Mike, can I take the issue again and commit this? Thanks for the refactoring, but as we now allow separate stored fields and main readers, the missing builder is fine to me - grrrr
        Hide
        Uwe Schindler added a comment -

        Minor improvements:

        • remove compiler warning because of redundant cast
        • rename the reader IdentitySet to be consistent in both impls.

        I think it's ready to commit.

        Show
        Uwe Schindler added a comment - Minor improvements: remove compiler warning because of redundant cast rename the reader IdentitySet to be consistent in both impls. I think it's ready to commit.
        Hide
        Michael McCandless added a comment -

        Ooh, nice catch on that validation bug!

        Yes, please feel free to take this one back. Your last patch looks great; I like the new ctor. +1 to commit.

        Show
        Michael McCandless added a comment - Ooh, nice catch on that validation bug! Yes, please feel free to take this one back. Your last patch looks great; I like the new ctor. +1 to commit.
        Hide
        Uwe Schindler added a comment -

        A previous discussion on IRC with Mike:

        We think that the numDocs checks are not needed and prevent advanced use cases. The whole ParallelReaders structure simply rely on maxDocs identical - not even hasDeletions need to be checked. It should simply be documented that Parallel*Reader takes the liveDocs/hasDeletions from the first reader and ignores livedocs of other readers.

        In 3.x this was more an issue, but in trunk, where liveDocs are completely separated, there is no need to check numDocs.

        The checking of numDocs is also no added safety, because 2 readers can have different liveDocs, but still same numDocs.

        Show
        Uwe Schindler added a comment - A previous discussion on IRC with Mike: We think that the numDocs checks are not needed and prevent advanced use cases. The whole ParallelReaders structure simply rely on maxDocs identical - not even hasDeletions need to be checked. It should simply be documented that Parallel*Reader takes the liveDocs/hasDeletions from the first reader and ignores livedocs of other readers. In 3.x this was more an issue, but in trunk, where liveDocs are completely separated, there is no need to check numDocs. The checking of numDocs is also no added safety, because 2 readers can have different liveDocs, but still same numDocs.
        Hide
        Uwe Schindler added a comment -

        New patch without numDocs checks (added javadocs describing that the deletions are taken from first reader). Also improved the tests to handle empty indexes for both reader variants.

        I will commit this later!

        Show
        Uwe Schindler added a comment - New patch without numDocs checks (added javadocs describing that the deletions are taken from first reader). Also improved the tests to handle empty indexes for both reader variants. I will commit this later!
        Hide
        Michael McCandless added a comment -

        Looks great Uwe, thanks! +1

        Show
        Michael McCandless added a comment - Looks great Uwe, thanks! +1
        Hide
        Uwe Schindler added a comment -

        Committed trunk revision: 1242924

        Thanks to all for reviewing!

        Show
        Uwe Schindler added a comment - Committed trunk revision: 1242924 Thanks to all for reviewing!
        Hide
        Uwe Schindler added a comment -

        Here is a patch that improves test coverage and adds one more check to the composite parallel reader ctor:
        If somebody has 2 parallel composite readers, while the first one has atomic subreaders but the other one has composite subreaders (but correct maxDocs), the ctor fails with ClassCastEx.

        I will commit this now as test improvement.

        Show
        Uwe Schindler added a comment - Here is a patch that improves test coverage and adds one more check to the composite parallel reader ctor: If somebody has 2 parallel composite readers, while the first one has atomic subreaders but the other one has composite subreaders (but correct maxDocs), the ctor fails with ClassCastEx. I will commit this now as test improvement.
        Hide
        Uwe Schindler added a comment -

        Committed trunk revision: 1245605

        Show
        Uwe Schindler added a comment - Committed trunk revision: 1245605
        Hide
        Uwe Schindler added a comment -

        I committed some minor code cleanups in revision: 1245897

        Show
        Uwe Schindler added a comment - I committed some minor code cleanups in revision: 1245897
        Hide
        Uwe Schindler added a comment -

        Robert also found some bugs in ParallelAtomicReader (in fact it was jenkins). Some fixes should be documented here.

        Show
        Uwe Schindler added a comment - Robert also found some bugs in ParallelAtomicReader (in fact it was jenkins). Some fixes should be documented here.
        Hide
        Uwe Schindler added a comment -

        This is a patch for ParallelAtomicReader (after the heavy commit of Robert, rev. 1291679), I found some other inconsistencies in some tests. One was already fixed (rev 1291753).

        The new implementation builds the ParallelFields instance using the FieldsEnums of the parallel readers, which is much more correct than iterating FieldInfos and checking for indexed fields.

        The fieldInfos and fieldToReaderMap is now only used for retrieving stored fields and other global information. Fields is now completely separate.

        Show
        Uwe Schindler added a comment - This is a patch for ParallelAtomicReader (after the heavy commit of Robert, rev. 1291679), I found some other inconsistencies in some tests. One was already fixed (rev 1291753). The new implementation builds the ParallelFields instance using the FieldsEnums of the parallel readers, which is much more correct than iterating FieldInfos and checking for indexed fields. The fieldInfos and fieldToReaderMap is now only used for retrieving stored fields and other global information. Fields is now completely separate.
        Hide
        Robert Muir added a comment -

        (after the heavy commit of Robert, rev. 1291679)

        Hey, maybe only a medium-heavy commit: i did reply to the jenkins failure with a patch to the list before committing

        The new implementation builds the ParallelFields instance using the FieldsEnums of the parallel readers, which is much more correct than iterating FieldInfos and checking for indexed fields.

        +1, when debugging i was a little confused not just about how it was built but where in the code... I think this is more intuitive!

        Show
        Robert Muir added a comment - (after the heavy commit of Robert, rev. 1291679) Hey, maybe only a medium-heavy commit: i did reply to the jenkins failure with a patch to the list before committing The new implementation builds the ParallelFields instance using the FieldsEnums of the parallel readers, which is much more correct than iterating FieldInfos and checking for indexed fields. +1, when debugging i was a little confused not just about how it was built but where in the code... I think this is more intuitive!
        Hide
        Uwe Schindler added a comment -

        More simplifications (especially cleaned up the test TestParallelTermEnum).

        Also added a separate map for term vectors, to improve speed for large indexes with many fields.

        Show
        Uwe Schindler added a comment - More simplifications (especially cleaned up the test TestParallelTermEnum). Also added a separate map for term vectors, to improve speed for large indexes with many fields.
        Hide
        Uwe Schindler added a comment -

        Committed trunk revision: 1291889

        Show
        Uwe Schindler added a comment - Committed trunk revision: 1291889

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Uwe Schindler
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development