Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 4.0-BETA
    • Fix Version/s: 4.0
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      On dev@lao Hoss reported that a user in Solr was not able to update or delete documents in his 3.x index with Solr 4:

      On the solr-user list, Dirk Högemann recently mentioned a problem he was seeing when he tried upgrading his existing solr setup from 3.x to 4.0-BETA. Specifically this exception getting logged...

      http://find.searchhub.org/document/cdb30099bfea30c6

      auto commit error...:java.lang.UnsupportedOperationException: this codec can only be used for reading
      at org.apache.lucene.codecs.lucene3x.Lucene3xCodec$1.writeLiveDocs(Lucene3xCodec.java:74)
      at org.apache.lucene.index.ReadersAndLiveDocs.writeLiveDocs(ReadersAndLiveDocs.java:278)
      at org.apache.lucene.index.IndexWriter$ReaderPool.release(IndexWriter.java:435)
      at org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:278)
      at org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2928)
      at org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2919)
      at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2666)
      at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2793)
      at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2773)
      at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:531)
      at org.apache.solr.update.CommitTracker.run(CommitTracker.java:214)

      Dirk was able to work arround this by completely re-indexing, but it seemed strange to me that this would happen.

      My understanding is that even though an IndexUpgrader tool was now available, it wasn't going to be required for users to use it when upgrading from 3.x to 4.x. Explicitly upgrading the index format might be a good idea, and might make hte index more performant, but as I understood it, the way things had been implemented with codecs explicitly upgrading the index format wasn't strictly neccessary, and that users should be able to upgrade their lucene apps same way that was supported with other index format upgrades in the past: the old index can be read, and as changes are made new segments will be re-written in the new format. (Note in
      particular: at the moment we don't mention IndexUpgrader in MIGRATE.txt at all.)

      It appears however, based on this stack trace and some other experiements i tried, that any attempts to "delete" documents in a segment that is using the Lucene3xCodec will fail.

      This seems like a really scary time bomb sitaution, because if you upgrade, things will seem to be working – you can even add documents, and depending on the order that you do things, some "old" segments may get merged and use the new format, so some deletes of "old" documents (in those merged/upgraded) segments may work, but then somewhere down the road, you may try to a delete that affects docs in a still un-merge/upgraded segment, and that delete will fail – 5 minutes later, if another merge has happened, attempting to do the exact same delete may succeed.

      All of which begs the question: is this a known/intended limitation of the Lucene3xCodec, or an oversight in the Lucene3xCodec?

      if it's expected, then it seems like we should definitely spell out this limitation in MIGRATE.txt and advocate either full rebuilds, or the use of IndexUpgrader for anyone who's indexes are non-static.

      On the Solr side of things, i think we should even want to consider automaticly running IndexUpgrader on startup if we detect that the Lucene3xCodec is in use to simplify things – we can't even suggest running "optimize" as a quick/easy way to force and index format upgrade because if the 3x index as already optimized then it's a no-op and the index stays in the 3x format.

      Robert said, that this is a wanted limitation (in fact its explicitely added to the code, without that UOE it "simply works"), but I disagree here and lots of other people:

      In the early days (I mean in the time when it was already read only until we refactored the IndexReader.delete()/Codec stuff), this was working, because the LiveDocs were always handled in a special way. Making it now 100% read-only is in my opinion very bad, as it does not allow to update documents in a 3.x index anymore, so you have no chance, you must run IndexUpgrader.

      The usual step like opening old Index and adding documents works (because the new documents are added always to new segment), but the much more usual IW.updateDocument() which is commonly used also to add documents fails on old Indexes. This is a no-go, we have to fix this. If we allow the trick with updating LiveDocs on 3.x codec, for the end-user the "read-only" stuff in Lucene3x codec would be completely invisible, as he can do everything IndexWriter provides. The other horrible things like changing norms is no longer possible, so deletes are the only thing that affects here. The read-only ness of Lucene3x codec would only be visible to the user when someone tries to explicitly create an index with Lucene3x codec. And I understood the CHANGES/MIGRATE.txt exactly as that.

      On the list, Robert added a simple patch, reverting the UOE in Lucene3xCodec, so the LiveDocs format is RW again.

      1. LUCENE-4339.patch
        5 kB
        Michael McCandless
      2. LUCENE-4339.patch
        3 kB
        Uwe Schindler

        Activity

        Hide
        Uwe Schindler added a comment -

        This is Robert's patch.

        Show
        Uwe Schindler added a comment - This is Robert's patch.
        Hide
        Robert Muir added a comment -

        I don't think this should be a blocker.

        lets discuss it on the thread instead.

        Show
        Robert Muir added a comment - I don't think this should be a blocker. lets discuss it on the thread instead.
        Hide
        Uwe Schindler added a comment -

        For me this is a blocker! I remember in the past we talked about it, but this UOE was silently added without my knowledge (otherwise I would have complained earlier). I would not +1 a release with this, so it's a blocker to me. In my recent talks on various conferences last year (new in Lucene 4.0, codecs), I always told people this is a "special case" and possible.

        Show
        Uwe Schindler added a comment - For me this is a blocker! I remember in the past we talked about it, but this UOE was silently added without my knowledge (otherwise I would have complained earlier). I would not +1 a release with this, so it's a blocker to me. In my recent talks on various conferences last year (new in Lucene 4.0, codecs), I always told people this is a "special case" and possible.
        Hide
        Uwe Schindler added a comment -

        I opened this issue for discussion, too.

        I totally agree that Lucene3x codec should be read-only, so nobody should be able to create a new Index or open IW with this as default codec. But an upgrade path without document updates in existing segements (which implies deletion in the old segment) is no upgrade path. In that case we can remove Lucene3x codec completely and only write a migration tool.

        Show
        Uwe Schindler added a comment - I opened this issue for discussion, too. I totally agree that Lucene3x codec should be read-only, so nobody should be able to create a new Index or open IW with this as default codec. But an upgrade path without document updates in existing segements (which implies deletion in the old segment) is no upgrade path. In that case we can remove Lucene3x codec completely and only write a migration tool.
        Hide
        Robert Muir added a comment -

        All you have to do is back down from the generalizations, agree we shouldnt
        touch trunk or the backwards compat policy here, and just say 'yes this is the right thing to do for 4.x'
        and you will have my consensus.

        Otherwise I will be against the change.

        Because this backwards stuff held back lucene for so long,
        in fact we wouldnt even be having this discussion if we didn't depart from it
        so that we could refactor these APIs to make sense. Please look at what these
        APIs were before when this change was made.

        For me this is a blocker! I remember in the past we talked about it, but this UOE was silently added without my knowledge (otherwise I would have complained earlier).

        This is exactly why we shouldn't burden lucene's backwards compatibility promise
        with crap like this: if we had enforced this, we wouldn't even be here, and Codec
        wouldn't encode the entire segment, it would have presented an insurmountable
        roadblock.

        Show
        Robert Muir added a comment - All you have to do is back down from the generalizations, agree we shouldnt touch trunk or the backwards compat policy here, and just say 'yes this is the right thing to do for 4.x' and you will have my consensus. Otherwise I will be against the change. Because this backwards stuff held back lucene for so long, in fact we wouldnt even be having this discussion if we didn't depart from it so that we could refactor these APIs to make sense. Please look at what these APIs were before when this change was made. For me this is a blocker! I remember in the past we talked about it, but this UOE was silently added without my knowledge (otherwise I would have complained earlier). This is exactly why we shouldn't burden lucene's backwards compatibility promise with crap like this: if we had enforced this, we wouldn't even be here, and Codec wouldn't encode the entire segment, it would have presented an insurmountable roadblock.
        Hide
        Uwe Schindler added a comment -

        I have 2 things:

        a) Allow it for Lucene 4.x to make the migration as easy as possible. This is not really complicated as we explicitely forbid something that works out of the box (because we have to rewrite SegmentInfos in all cases when new segments are added to a pre-4.x index).
        b) By concern about backwards is: The idea was always to allow user code (not advanced code using Codecs directly), to open a previous major version index and add/update documents - this is all I want to have in the backwards policy. Adding is easy, as it creates new segments. Updating unfortunately need to delete in the old segment. This requirement implies that a Codec can be read-only for backwards compatibility, but the LiveDocs implementation unfortunately must be rw to support delete and update.

        Show
        Uwe Schindler added a comment - I have 2 things: a) Allow it for Lucene 4.x to make the migration as easy as possible. This is not really complicated as we explicitely forbid something that works out of the box (because we have to rewrite SegmentInfos in all cases when new segments are added to a pre-4.x index). b) By concern about backwards is: The idea was always to allow user code (not advanced code using Codecs directly), to open a previous major version index and add/update documents - this is all I want to have in the backwards policy . Adding is easy, as it creates new segments. Updating unfortunately need to delete in the old segment. This requirement implies that a Codec can be read-only for backwards compatibility, but the LiveDocs implementation unfortunately must be rw to support delete and update.
        Hide
        Robert Muir added a comment -

        I already told you I agree with 'a' in this particular case, but not in every case ('b').

        Its unlikely my opinion here would change. Why conflate the two issues then?

        Leave 'b' alone, get it the hell out of this issue, and start another
        endless-thread-about-backwards-compat-policy like there always was in the past,
        and we can fix 'a'

        Show
        Robert Muir added a comment - I already told you I agree with 'a' in this particular case, but not in every case ('b'). Its unlikely my opinion here would change. Why conflate the two issues then? Leave 'b' alone, get it the hell out of this issue, and start another endless-thread-about-backwards-compat-policy like there always was in the past, and we can fix 'a'
        Hide
        Yonik Seeley added a comment -

        All you have to do is back down from the generalizations, agree we shouldnt
        touch trunk or the backwards compat policy here, and just say 'yes this is the right thing to do for 4.x'
        and you will have my consensus.

        Otherwise I will be against the change.

        That doesn't make sense. You can disagree about back compat policies in general, but it's not kosher to hold this specific issue (which you agree with) hostage to force your viewpoint.

        +1 to commit.

        Show
        Yonik Seeley added a comment - All you have to do is back down from the generalizations, agree we shouldnt touch trunk or the backwards compat policy here, and just say 'yes this is the right thing to do for 4.x' and you will have my consensus. Otherwise I will be against the change. That doesn't make sense. You can disagree about back compat policies in general, but it's not kosher to hold this specific issue (which you agree with) hostage to force your viewpoint. +1 to commit.
        Hide
        Robert Muir added a comment -

        It makes total sense. I gave my technical rationale for this. you can look at svn history if you dont believe me.

        I'm -1 for the record, until this gets resolved.

        Show
        Robert Muir added a comment - It makes total sense. I gave my technical rationale for this. you can look at svn history if you dont believe me. I'm -1 for the record, until this gets resolved.
        Hide
        Yonik Seeley added a comment -

        You're trying to leverage your vote on this issue (deletes for 3x segments, which you already agreed with) to try and get your way on another issue (general back compat policy). That's not a valid veto, but I'll let others fight it out.

        Show
        Yonik Seeley added a comment - You're trying to leverage your vote on this issue (deletes for 3x segments, which you already agreed with) to try and get your way on another issue (general back compat policy). That's not a valid veto, but I'll let others fight it out.
        Hide
        Robert Muir added a comment -

        I'm trying to stop this issue from being 'change both' and get the issue refined to just being deletes for 3.x segments (and not a change to general back compat).

        I don't think the patch should be committed to trunk for this reason (only 4.x, with a comment that this is just a case-by-case decision).
        I gave reasons why such a decision would serve as a roadblock for future (and actually past!) changes.

        I gave good reasons for this. Its a valid veto (which I will enforce). This is important to me, and i'm not going to back down.
        You can do whatever you like: whine to the apache board for all i care. I still won't back down.

        Best to fix the scope of this issue to be only the 4.x change as a case-by-case decision and leave the back compat policy for discussion elsewhere
        if you really want this change to go in.

        Show
        Robert Muir added a comment - I'm trying to stop this issue from being 'change both' and get the issue refined to just being deletes for 3.x segments (and not a change to general back compat). I don't think the patch should be committed to trunk for this reason (only 4.x, with a comment that this is just a case-by-case decision). I gave reasons why such a decision would serve as a roadblock for future (and actually past!) changes. I gave good reasons for this. Its a valid veto (which I will enforce). This is important to me, and i'm not going to back down. You can do whatever you like: whine to the apache board for all i care. I still won't back down. Best to fix the scope of this issue to be only the 4.x change as a case-by-case decision and leave the back compat policy for discussion elsewhere if you really want this change to go in.
        Hide
        Yonik Seeley added a comment -

        Folks: I still see no objections to enabling deletions on Lucene3x codecs (as this issue says, and as the patch does).
        I assume this functionality will be committed within the next few days. If it hasn't yet due to meta-arguments (i.e. someone's reasoning around the patch rather than the patch itself), then I will commit.

        Show
        Yonik Seeley added a comment - Folks: I still see no objections to enabling deletions on Lucene3x codecs (as this issue says, and as the patch does). I assume this functionality will be committed within the next few days. If it hasn't yet due to meta-arguments (i.e. someone's reasoning around the patch rather than the patch itself), then I will commit.
        Hide
        Uwe Schindler added a comment -

        The whole backwards compatibility discusssion should be discussed (maybe in a flamewar again), but it is unrelated to this issue! My concerns regarding the backwards issue is a separate issue! I'll open a new one.

        I would like to applay the current patch, as developed by Robert that allows Lucene 4.0 to open 3.x indexes and do the following:

        • add documents to it (this was always possible)
        • delete/update documents from the 3.x segments only. This is a small change because it was always working and was only willingful disabled in some cleanup commit.
        Show
        Uwe Schindler added a comment - The whole backwards compatibility discusssion should be discussed (maybe in a flamewar again), but it is unrelated to this issue! My concerns regarding the backwards issue is a separate issue! I'll open a new one. I would like to applay the current patch, as developed by Robert that allows Lucene 4.0 to open 3.x indexes and do the following: add documents to it (this was always possible) delete/update documents from the 3.x segments only. This is a small change because it was always working and was only willingful disabled in some cleanup commit.
        Hide
        Uwe Schindler added a comment -

        I don't think the patch should be committed to trunk for this reason (only 4.x, with a comment that this is just a case-by-case decision). I gave reasons why such a decision would serve as a roadblock for future (and actually past!) changes.

        It cannot be committed to trunk, as there is no Lucene 3.x codec

        Show
        Uwe Schindler added a comment - I don't think the patch should be committed to trunk for this reason (only 4.x, with a comment that this is just a case-by-case decision). I gave reasons why such a decision would serve as a roadblock for future (and actually past!) changes. It cannot be committed to trunk, as there is no Lucene 3.x codec
        Hide
        Uwe Schindler added a comment -

        Finally, please apply this patch ++++++++++++++++++++1 !!!

        Show
        Uwe Schindler added a comment - Finally, please apply this patch ++++++++++++++++++++1 !!!
        Hide
        Robert Muir added a comment -

        The whole backwards compatibility discusssion should be discussed (maybe in a flamewar again), but it is unrelated to this issue! My concerns regarding the backwards issue is a separate issue! I'll open a new one.

        Thanks, thats all I wanted.

        It cannot be committed to trunk, as there is no Lucene 3.x codec

        I refer to the new test in TestBackwardsCompatibility. This would basically change the backwards compatibility policy
        in a significant way (one that restricts what we can do across major versions). Thats why its important to think
        about it and discuss it more and not try to do that here.

        If we decide in a separate issue/thread that we want to extend back compat beyond reading (to also include writing),
        which we might do, then we should of course merge the new test at that time to trunk so that its actually tested in some way.
        But we should wait for the discussion for that.

        Show
        Robert Muir added a comment - The whole backwards compatibility discusssion should be discussed (maybe in a flamewar again), but it is unrelated to this issue! My concerns regarding the backwards issue is a separate issue! I'll open a new one. Thanks, thats all I wanted. It cannot be committed to trunk, as there is no Lucene 3.x codec I refer to the new test in TestBackwardsCompatibility. This would basically change the backwards compatibility policy in a significant way (one that restricts what we can do across major versions). Thats why its important to think about it and discuss it more and not try to do that here. If we decide in a separate issue/thread that we want to extend back compat beyond reading (to also include writing), which we might do, then we should of course merge the new test at that time to trunk so that its actually tested in some way. But we should wait for the discussion for that.
        Hide
        Robert Muir added a comment -

        I would like to applay the current patch, as developed by Robert that allows Lucene 4.0 to open 3.x indexes and do the following:

        add documents to it (this was always possible)
        delete/update documents from the 3.x segments only. This is a small change because it was always working and was only willingful disabled in some cleanup commit.

        This last statement is totally false. it was not always working. I had to disable this so that we could move all portions of the index
        under codec control (as i already said on the mailing list). The problem was at the time, Codec provided its own segment infos writer,
        so it could customize commits. This was necessary to support e.g. AppendableCodec. Because of that it was not really possible to
        allow 3.x deletes, because we had no real way to write a "3.x compatible commit that also contains newer 4.x information/4.x segments".
        The way SegmentInfosReader/Writer worked at the time presented no opportunity to make a "frankenstein" 3.x/4.x specialized mixed commit
        point to do this either.

        Its just now only easy to re-enable because we did a ton of work to detangle this in LUCENE-4050/LUCENE-4055, split the per-segment metadata
        (which is owned by Codec) from the commit information (which is just a list of segments+deletes gens). So we can easily write a 4.x commit,
        and at the same time preserve the 3.x per-segment metadata.

        So this was no cleanup commit!

        And again thats why I am against changing the backwards policy, because if we had enforced this earlier we would have likely never been able to
        iterate to this point: it would have presented a massive roadblock.

        Show
        Robert Muir added a comment - I would like to applay the current patch, as developed by Robert that allows Lucene 4.0 to open 3.x indexes and do the following: add documents to it (this was always possible) delete/update documents from the 3.x segments only. This is a small change because it was always working and was only willingful disabled in some cleanup commit. This last statement is totally false. it was not always working. I had to disable this so that we could move all portions of the index under codec control (as i already said on the mailing list). The problem was at the time, Codec provided its own segment infos writer, so it could customize commits. This was necessary to support e.g. AppendableCodec. Because of that it was not really possible to allow 3.x deletes, because we had no real way to write a "3.x compatible commit that also contains newer 4.x information/4.x segments". The way SegmentInfosReader/Writer worked at the time presented no opportunity to make a "frankenstein" 3.x/4.x specialized mixed commit point to do this either. Its just now only easy to re-enable because we did a ton of work to detangle this in LUCENE-4050 / LUCENE-4055 , split the per-segment metadata (which is owned by Codec) from the commit information (which is just a list of segments+deletes gens). So we can easily write a 4.x commit, and at the same time preserve the 3.x per-segment metadata. So this was no cleanup commit! And again thats why I am against changing the backwards policy, because if we had enforced this earlier we would have likely never been able to iterate to this point: it would have presented a massive roadblock.
        Hide
        Robert Muir added a comment -

        I can commit this to 4.x once Mike has had a chance to look/think about it too.

        Just again to reiterate, this was not done for cleanup or silly reasons, but good reasons.

        But I don't imagine us refactoring the codec API or index structure for 4.x in any way that
        would be made more difficult by this patch: I just want all possibilities left on the table
        for the next major release.

        Show
        Robert Muir added a comment - I can commit this to 4.x once Mike has had a chance to look/think about it too. Just again to reiterate, this was not done for cleanup or silly reasons, but good reasons. But I don't imagine us refactoring the codec API or index structure for 4.x in any way that would be made more difficult by this patch: I just want all possibilities left on the table for the next major release.
        Hide
        Michael McCandless added a comment -

        +1 to separate what we do for 4.0 vs a change to our back-compat policy. It seems like (I have to think about the patch) doing this for 4.0 is easy ... but that may not necessarily hold true for future releases.

        And I had not realized our official back-compat policy all this time was only ensuring reading (not writing/updating) old indices (thanks Robert). I do worry that users will "expect" writing comes along with that (given that all past releases have allowed this). But then again the fact that no users have complained (until now) about 4.0 disallowing deletions on a 3.x index is telling ... maybe users re-index or run the upgrade tool.

        Show
        Michael McCandless added a comment - +1 to separate what we do for 4.0 vs a change to our back-compat policy. It seems like (I have to think about the patch) doing this for 4.0 is easy ... but that may not necessarily hold true for future releases. And I had not realized our official back-compat policy all this time was only ensuring reading (not writing/updating) old indices (thanks Robert). I do worry that users will "expect" writing comes along with that (given that all past releases have allowed this). But then again the fact that no users have complained (until now) about 4.0 disallowing deletions on a 3.x index is telling ... maybe users re-index or run the upgrade tool.
        Hide
        Hoss Man added a comment -

        And I had not realized our official back-compat policy all this time was only ensuring reading (not writing/updating) old indices (thanks Robert)

        I think there was "bug" in the wording of that wiki page, because the understanding and discussion in all past releases (including in the described upgrade steps) was that lucene would "read" your existing index, and automatically convert as you made updates - a delete is a type of update.

        +1 to separate what we do for 4.0 vs a change to our back-compat policy. It seems like (I have to think about the patch) doing this for 4.0 is easy ... but that may not necessarily hold true for future releases.

        Agreed – if we want to say that 5.0 can "read" 4.0 but can't delete docs that's fine, but let's worry about that when a situation where it actually makes a difference in performance/simplicity/maintainability comes up. Hell, if it really makes a big difference, i'm fine with saying that you must run an upgrade tool for 5.0 to even read 4.0, but we should't make those decisions in the abstract, they should be based on actual implications.

        the fact that no users have complained (until now) about 4.0 disallowing deletions on a 3.x index is telling.

        Personally I think people are reading too much into this - i suspect most of the people who have been using the alpha and beta so far are the more "adventurous" devs, who are more likely to "rebuild the world" on upgrade anyway. More "cautious" and "conservative" devs who will want to upgrade in place are probably not that interested in looking at 4.0 until 4.0-final.

        Bottom line: If we have a simple patch that will allow 4.0 to not only "read" 3.x indexes, but also easily "update" those indexes via delets/merges, then i say we commit it.

        If enough folks feel strongly that this patch shouldn't be committed, and that IndexUpgrader should be used by any user who might want to "delete" docs from a 3x index, then I would argue that we not only need to more heavily document this, but we should also find some way to make the 4.0 IndexWriter "fail fast" when you point it at an index that contains segments using the 3x codec - we should not allow this time bomb situation where some doc updates/deletes might work because the segments have already been merged/upgraded to the 4x format, but other updates/deletes fail because the affected documents are still in a 3x segment.

        Show
        Hoss Man added a comment - And I had not realized our official back-compat policy all this time was only ensuring reading (not writing/updating) old indices (thanks Robert) I think there was "bug" in the wording of that wiki page, because the understanding and discussion in all past releases (including in the described upgrade steps) was that lucene would "read" your existing index, and automatically convert as you made updates - a delete is a type of update. +1 to separate what we do for 4.0 vs a change to our back-compat policy. It seems like (I have to think about the patch) doing this for 4.0 is easy ... but that may not necessarily hold true for future releases. Agreed – if we want to say that 5.0 can "read" 4.0 but can't delete docs that's fine, but let's worry about that when a situation where it actually makes a difference in performance/simplicity/maintainability comes up. Hell, if it really makes a big difference, i'm fine with saying that you must run an upgrade tool for 5.0 to even read 4.0, but we should't make those decisions in the abstract, they should be based on actual implications. the fact that no users have complained (until now) about 4.0 disallowing deletions on a 3.x index is telling. Personally I think people are reading too much into this - i suspect most of the people who have been using the alpha and beta so far are the more "adventurous" devs, who are more likely to "rebuild the world" on upgrade anyway. More "cautious" and "conservative" devs who will want to upgrade in place are probably not that interested in looking at 4.0 until 4.0-final. Bottom line: If we have a simple patch that will allow 4.0 to not only "read" 3.x indexes, but also easily "update" those indexes via delets/merges, then i say we commit it. If enough folks feel strongly that this patch shouldn't be committed, and that IndexUpgrader should be used by any user who might want to "delete" docs from a 3x index, then I would argue that we not only need to more heavily document this, but we should also find some way to make the 4.0 IndexWriter "fail fast" when you point it at an index that contains segments using the 3x codec - we should not allow this time bomb situation where some doc updates/deletes might work because the segments have already been merged/upgraded to the 4x format, but other updates/deletes fail because the affected documents are still in a 3x segment.
        Hide
        Shai Erera added a comment -

        I don't think that users complaining or not should affect our back-compat policy. I suspect many people trying 4.0-* didn't yet deploy it in production, at least not large productions. So we cannot let that affect our decisions. I'm pretty sure that in the Big Data world, it's not going to be accepted that you need to re-index the data, or run a migration tool over indexes that may be TBs in size, just because it may be hard to support deleting documents from older (1 version back only though !) segments.

        In short, let's postpone decisions such as "you must run an upgrader tool" until a real problem comes up in one of the future releases. There's no need to make such decision at this point.

        Show
        Shai Erera added a comment - I don't think that users complaining or not should affect our back-compat policy. I suspect many people trying 4.0-* didn't yet deploy it in production, at least not large productions. So we cannot let that affect our decisions. I'm pretty sure that in the Big Data world, it's not going to be accepted that you need to re-index the data, or run a migration tool over indexes that may be TBs in size, just because it may be hard to support deleting documents from older (1 version back only though !) segments. In short, let's postpone decisions such as "you must run an upgrader tool" until a real problem comes up in one of the future releases. There's no need to make such decision at this point.
        Hide
        Robert Muir added a comment -

        Bottom line: If we have a simple patch that will allow 4.0 to not only "read" 3.x indexes, but also easily "update" those indexes via delets/merges, then i say we commit it.

        See Mike's comment (I have to think about the patch). The patch is a bit scary. I feel like I'm in the same boat. I think if its safe we should do it, but its not totally obvious that its safe

        we should not allow this time bomb situation where some doc updates/deletes might work because the segments have already been merged/upgraded to the 4x format, but other updates/deletes fail because the affected documents are still in a 3x segment.

        Whoah: its not really a time bomb. The time bomb is solr's 'autocommit' that this user was using.
        And that autocommit will have the same 'time bomb' if the user runs out of disk space or whatever. If they aren't
        aware precisely when commits are happening, then when an error happens (which will happen eventually), they won't be sure
        what changes made it and what didn't.

        In general for people properly handling their exceptions from commit() etc, its fine. because the commit will never succeed and they will know their
        changes didnt make it.

        Its just not user-friendly.

        Show
        Robert Muir added a comment - Bottom line: If we have a simple patch that will allow 4.0 to not only "read" 3.x indexes, but also easily "update" those indexes via delets/merges, then i say we commit it. See Mike's comment (I have to think about the patch). The patch is a bit scary. I feel like I'm in the same boat. I think if its safe we should do it, but its not totally obvious that its safe we should not allow this time bomb situation where some doc updates/deletes might work because the segments have already been merged/upgraded to the 4x format, but other updates/deletes fail because the affected documents are still in a 3x segment. Whoah: its not really a time bomb. The time bomb is solr's 'autocommit' that this user was using. And that autocommit will have the same 'time bomb' if the user runs out of disk space or whatever. If they aren't aware precisely when commits are happening, then when an error happens (which will happen eventually), they won't be sure what changes made it and what didn't. In general for people properly handling their exceptions from commit() etc, its fine. because the commit will never succeed and they will know their changes didnt make it. Its just not user-friendly.
        Hide
        Robert Muir added a comment -

        I don't think that users complaining or not should affect our back-compat policy. I suspect many people trying 4.0-* didn't yet deploy it in production, at least not large productions. So we cannot let that affect our decisions. I'm pretty sure that in the Big Data world, it's not going to be accepted that you need to re-index the data, or run a migration tool over indexes that may be TBs in size, just because it may be hard to support deleting documents from older (1 version back only though !) segments.

        In short, let's postpone decisions such as "you must run an upgrader tool" until a real problem comes up in one of the future releases. There's no need to make such decision at this point.

        No, its the opposite: lets postpone decisions to change the backwards compatibility policy to also affect 'writes' to a separate issue.

        If they get conflated with this issue, then my opinion towards this issue changes dramatically (I will be against it).

        Show
        Robert Muir added a comment - I don't think that users complaining or not should affect our back-compat policy. I suspect many people trying 4.0-* didn't yet deploy it in production, at least not large productions. So we cannot let that affect our decisions. I'm pretty sure that in the Big Data world, it's not going to be accepted that you need to re-index the data, or run a migration tool over indexes that may be TBs in size, just because it may be hard to support deleting documents from older (1 version back only though !) segments. In short, let's postpone decisions such as "you must run an upgrader tool" until a real problem comes up in one of the future releases. There's no need to make such decision at this point. No, its the opposite: lets postpone decisions to change the backwards compatibility policy to also affect 'writes' to a separate issue. If they get conflated with this issue, then my opinion towards this issue changes dramatically (I will be against it).
        Hide
        Uwe Schindler added a comment -

        Can we please stop bringing the backwards policy in this issue? I already said I will open another issue. This is just to support this in 4.0!

        Show
        Uwe Schindler added a comment - Can we please stop bringing the backwards policy in this issue? I already said I will open another issue. This is just to support this in 4.0!
        Hide
        Hoss Man added a comment - - edited

        Whoah: its not really a time bomb. The time bomb is solr's 'autocommit' that this user was using. And that autocommit will have the same 'time bomb' if the user runs out of disk space or whatever.

        autocommit has absolutely nothing to do with the problem i'm describing – it isn't scary because it can happen when you use autocommit, it's scary because it can might happen on some commits, but it might not happen on others.

        (perhaps "timebomb" is not the right word, because it implies it will absolutely happen at some fixed point in the future. "unpredictible bomb" is be a better description)

        The point is: when you upgrade, some deletes might work, and other deletes might fail, all depending on if/how/when you sprinkle in some adds that might automatically trigger segment merges.

        that is the major concern i'm worried about, a user could do the following steps:

        1) upgrade to lucene 4.0
        2) open their 3x index with 4.0
        3) test that some searches X,Y,Z work
        4) add some number of documents N and test that they show up in searches.
        5) delete/update some old documents A,B,C and test that the delete/updates are applied

        ..and think everything looks fine. No failures, no exceptions.

        If they then attempt the exact same set of steps again, on the exact same index, but change the number of documents "N" that they add, then things could blow up (EDIT: "in step #5") – all entirely depending on which segments were merged in step #4.

        The reason i called it a timebomb is because for usage patterns i'm mostly familiar with – where lots of documents are added frequently, and the likelyhood of updateing/deleteing a document decreases as the document gets "older" this sort of problem might not manifest itself for a very long time after upgrading, because the small "recent" segments that get merged quickly after upgrade (as new docs are added) are also the ones most likely to see deletes/updates – which will succeed. it's only when that rare situation of "updating" a really old document in a still unmerged 3x segment happens that users will sudenly get this expection – and be suprised because they may have already done lots of delets on their "existing" index w/o seeing this problem.

        Show
        Hoss Man added a comment - - edited Whoah: its not really a time bomb. The time bomb is solr's 'autocommit' that this user was using. And that autocommit will have the same 'time bomb' if the user runs out of disk space or whatever. autocommit has absolutely nothing to do with the problem i'm describing – it isn't scary because it can happen when you use autocommit, it's scary because it can might happen on some commits, but it might not happen on others. (perhaps "timebomb" is not the right word, because it implies it will absolutely happen at some fixed point in the future. "unpredictible bomb" is be a better description) The point is: when you upgrade, some deletes might work, and other deletes might fail, all depending on if/how/when you sprinkle in some adds that might automatically trigger segment merges. that is the major concern i'm worried about, a user could do the following steps: 1) upgrade to lucene 4.0 2) open their 3x index with 4.0 3) test that some searches X,Y,Z work 4) add some number of documents N and test that they show up in searches. 5) delete/update some old documents A,B,C and test that the delete/updates are applied ..and think everything looks fine. No failures, no exceptions. If they then attempt the exact same set of steps again, on the exact same index, but change the number of documents "N" that they add, then things could blow up ( EDIT: "in step #5") – all entirely depending on which segments were merged in step #4. The reason i called it a timebomb is because for usage patterns i'm mostly familiar with – where lots of documents are added frequently, and the likelyhood of updateing/deleteing a document decreases as the document gets "older" this sort of problem might not manifest itself for a very long time after upgrading, because the small "recent" segments that get merged quickly after upgrade (as new docs are added) are also the ones most likely to see deletes/updates – which will succeed. it's only when that rare situation of "updating" a really old document in a still unmerged 3x segment happens that users will sudenly get this expection – and be suprised because they may have already done lots of delets on their "existing" index w/o seeing this problem.
        Hide
        Uwe Schindler added a comment - - edited

        I agree with Hoss, the problem is not only that it does not work, its also that it sometimes work sometimes not and IndexWriter does not fail fast. So if we do not commit this, we have to make it fail-fast. But I strongly prefer to commit it - I also looked at the code, the handling of LiveDocs is fine from my perspective, as the index cannot get corrupt:

        • The LiveDocs implementation is the same for 4.0 and 3.x codec
        • When the new livedocs are finally written to disk (after commit), the index automatically gets a new commit, so the 3.x/4.0 codec will already write new SegmentInfos, so index is already partially upgraded (because new segments will also be 4.0).

        I strongly want to commit this, maybe we should add a better test, I can look into it tomorrow if nobody else wants to jump in (I would be so happy).

        Show
        Uwe Schindler added a comment - - edited I agree with Hoss, the problem is not only that it does not work, its also that it sometimes work sometimes not and IndexWriter does not fail fast. So if we do not commit this, we have to make it fail-fast. But I strongly prefer to commit it - I also looked at the code, the handling of LiveDocs is fine from my perspective, as the index cannot get corrupt: The LiveDocs implementation is the same for 4.0 and 3.x codec When the new livedocs are finally written to disk (after commit), the index automatically gets a new commit, so the 3.x/4.0 codec will already write new SegmentInfos, so index is already partially upgraded (because new segments will also be 4.0). I strongly want to commit this, maybe we should add a better test, I can look into it tomorrow if nobody else wants to jump in (I would be so happy).
        Hide
        Robert Muir added a comment -

        I also looked at the code, the handling of LiveDocs is fine from my perspective, as the index cannot get corrupt

        This is what i think too, but I'd rather be more confident (e.g. get Mike's +1). We have a limited capability
        to test this scenario really. But lots of stuff was cleaned up so I feel pretty good about it myself. That doesn't
        make me confident enough to commit though... some of this is probably just paranoia/brain-damage from how PreFlex
        worked in earlier iterations of trunk though

        Thats the risk: today the behavior is not "user-friendly" but it causes no corruption or data loss. If you had asked me
        'can you delete things from 3.x segments' i would have told you NO, so its no surprise to me (not a bug IMO).

        I agree it would be fantastic to have easier backwards compatibility for users migrating, sure who wouldnt? But this isn't
        the only possible solution we have at our disposal, and lets just be careful to ensure we don't introduce a nasty bug.

        So please no heavy committing, there is no rush.

        Show
        Robert Muir added a comment - I also looked at the code, the handling of LiveDocs is fine from my perspective, as the index cannot get corrupt This is what i think too, but I'd rather be more confident (e.g. get Mike's +1). We have a limited capability to test this scenario really. But lots of stuff was cleaned up so I feel pretty good about it myself. That doesn't make me confident enough to commit though... some of this is probably just paranoia/brain-damage from how PreFlex worked in earlier iterations of trunk though Thats the risk: today the behavior is not "user-friendly" but it causes no corruption or data loss. If you had asked me 'can you delete things from 3.x segments' i would have told you NO, so its no surprise to me (not a bug IMO). I agree it would be fantastic to have easier backwards compatibility for users migrating, sure who wouldnt? But this isn't the only possible solution we have at our disposal, and lets just be careful to ensure we don't introduce a nasty bug. So please no heavy committing, there is no rush.
        Hide
        Michael McCandless added a comment -

        +1, the patch looks good!

        The delGen/delCount are separately tracked in SegmentInfoPerCommit,
        the first time we read a 3.x index we read the legacy SegmentInfos
        format into RAM with the proper delCount/delGen
        (Lucene3xSegmentInfosReader.readLegacyInfos). If any new deletes
        happen against that segment, SIPC will be updated, and on commit we'll
        write delGen/delCount just like we would for a non-3x index. We use a
        separate method (SegmentInfos.write3xInfo) to write just the
        SegmentInfo for 3.x segments.

        SIPC.files() will correctly show the new del gen filename on write
        since we just use Lucene40LiveDocsFormat.

        Show
        Michael McCandless added a comment - +1, the patch looks good! The delGen/delCount are separately tracked in SegmentInfoPerCommit, the first time we read a 3.x index we read the legacy SegmentInfos format into RAM with the proper delCount/delGen (Lucene3xSegmentInfosReader.readLegacyInfos). If any new deletes happen against that segment, SIPC will be updated, and on commit we'll write delGen/delCount just like we would for a non-3x index. We use a separate method (SegmentInfos.write3xInfo) to write just the SegmentInfo for 3.x segments. SIPC.files() will correctly show the new del gen filename on write since we just use Lucene40LiveDocsFormat.
        Hide
        Michael McCandless added a comment -

        I improved the test case a bit, and that uncovered a bug in how we upgrade a 3.x SegmentInfo: we were not recording the marker file in SI's fileSet, which caused IFD to delete it and for the upgrade to keep happening every time commit is called. I fixed it by adding the marker file up front before we write the .si.

        Show
        Michael McCandless added a comment - I improved the test case a bit, and that uncovered a bug in how we upgrade a 3.x SegmentInfo: we were not recording the marker file in SI's fileSet, which caused IFD to delete it and for the upgrade to keep happening every time commit is called. I fixed it by adding the marker file up front before we write the .si.
        Hide
        Uwe Schindler added a comment -

        Cool, thanks. Was the bug also there before this patch (means other code merging old segments could be affected)?

        Show
        Uwe Schindler added a comment - Cool, thanks. Was the bug also there before this patch (means other code merging old segments could be affected)?
        Hide
        Michael McCandless added a comment -

        Yes this bug is pre-existing ... but I think it doesn't break anything, ie, it's a performance bug since on every commit it needlessly goes and writes _N.si files again.

        Show
        Michael McCandless added a comment - Yes this bug is pre-existing ... but I think it doesn't break anything, ie, it's a performance bug since on every commit it needlessly goes and writes _N.si files again.
        Hide
        Robert Muir added a comment -

        Thanks Mike!

        Show
        Robert Muir added a comment - Thanks Mike!
        Hide
        Uwe Schindler added a comment -

        Thanks Mike and Robert - you make me happy! One blocker less, hihi.

        Show
        Uwe Schindler added a comment - Thanks Mike and Robert - you make me happy! One blocker less, hihi.
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Unassigned
            Reporter:
            Uwe Schindler
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development