Lucene - Core
  1. Lucene - Core
  2. LUCENE-702

Disk full during addIndexes(Directory[]) can corrupt index

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.1
    • Fix Version/s: 2.1
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      This is a spinoff of LUCENE-555

      If the disk fills up during this call then the committed segments file can reference segments that were not written. Then the whole index becomes unusable.

      Does anyone know of any other cases where disk full could corrupt the index?

      I think disk full should worse lose the documents that were "in flight" at the time. It shouldn't corrupt the index.

      1. LUCENE-702.patch
        73 kB
        Michael McCandless
      2. LUCENE-702.take2.patch
        77 kB
        Michael McCandless
      3. LUCENE-702.take3.patch
        77 kB
        Michael McCandless

        Activity

        Hide
        Ning Li added a comment -

        A possible solution to this issue is to check, when writing segment infos to "segments" in directory d,
        whether dir of a segment info is d, and only write if it is. Suggestions?

        The following is my comment on this issue from the mailing list documenting how Lucene could
        produce an inconsistent index if addIndexes(Directory[]) does not run to its completion.

        "This makes me notice a bug in current addIndexes(Directory[]). In current addIndexes(Directory[]),
        segment infos in S are added to T's "segmentInfos" upfront. Then segments in S are merged to T
        several at a time. Every merge is committed with T's "segmentInfos". So if a reader is opened on T
        while addIndexes(Directory[]) is going on, it could see an inconsistent index."

        Show
        Ning Li added a comment - A possible solution to this issue is to check, when writing segment infos to "segments" in directory d, whether dir of a segment info is d, and only write if it is. Suggestions? The following is my comment on this issue from the mailing list documenting how Lucene could produce an inconsistent index if addIndexes(Directory[]) does not run to its completion. "This makes me notice a bug in current addIndexes(Directory[]). In current addIndexes(Directory[]), segment infos in S are added to T's "segmentInfos" upfront. Then segments in S are merged to T several at a time. Every merge is committed with T's "segmentInfos". So if a reader is opened on T while addIndexes(Directory[]) is going on, it could see an inconsistent index."
        Hide
        Michael McCandless added a comment -

        That seems like a reasonable approach? At least the index would be
        consistent (ie, loadable).

        Though, if you hit disk full part way through, then some of your
        indexes were added and some where not. How do you "recover" after you
        free up your disk space? If you just re-add all indexes then you have
        duplicates in the index.

        Is it possible instead to not reference (in the written segments info
        file) those segments that were carried over from the input
        readers/directories? This would make the operation transactional, so
        that if we crashed part way through, then the index rolls back to
        where it was before the addIndexes call. And user could free up disk
        space and try again, without creating dups.

        (One problem with this is that the orphan'd segments that had been
        written would not actually get deleted automatically; lockless commits
        would fix that though because it recomputes on instantiating a reader
        which files are unreferenced and then deletes them).

        Show
        Michael McCandless added a comment - That seems like a reasonable approach? At least the index would be consistent (ie, loadable). Though, if you hit disk full part way through, then some of your indexes were added and some where not. How do you "recover" after you free up your disk space? If you just re-add all indexes then you have duplicates in the index. Is it possible instead to not reference (in the written segments info file) those segments that were carried over from the input readers/directories? This would make the operation transactional, so that if we crashed part way through, then the index rolls back to where it was before the addIndexes call. And user could free up disk space and try again, without creating dups. (One problem with this is that the orphan'd segments that had been written would not actually get deleted automatically; lockless commits would fix that though because it recomputes on instantiating a reader which files are unreferenced and then deletes them).
        Hide
        Michael McCandless added a comment -

        I think we should try to make all of the addIndexes calls (and more
        generally any call to Lucene) "transactional". Meaning, if the call
        is aborted (machine crashes, disk full, jvm killed, neutrino hits CPU,
        etc.) then your index just "rolls back" to where it was at the start
        of the call. Ie, it is consistent and none of the incoming documents
        were added.

        This way your index is fine after the crash, and, you can fix the
        cause of the crash and re-run the addIndexes call and you won't get
        duplicate documents.

        To achieve this, each of the three addIndexes methods would need to 1)
        not commit a new segments file until the end, and 2) not delete any
        segments referenced by the initial segments file (segmentInfos) until
        the end.

        We have three methods now for addIndexes:

        • For addIndexes(IndexReader[]): this method is I think already
          transactional. We create a merger, add all readers to it, do the
          merge, and only at the end commit the new segments file & remove
          old segments.
        • For addIndexes(Directory[]): this method can currently corrupt the
          index if aborted. However, because all merging is done only on
          the newly added segments, I think the fix is simply to not commit
          the new segments file until the end?
        • For addIndexesNoOptimize(Directory[]): this method can also
          currently corrupt the index if aborted. To fix this I think we
          need to not only prevent committing a new segments file until the
          end, but also to prevent deletion of any segments in the original
          segments file. This is because it's able (I think?) to merge
          both old and new segments in its step 3. This would normally
          result in deleting those old segments that were merged.

        Note that this will increase the temporary disk usage used during
        the call, because old segments must remain on disk even if they
        have been merged, but I think this is the right tradeoff
        (transactional vs temporary disk usage)?

        Also note that we would need the fixes from lockless LUCENE-701 to
        properly delete orphan'd segments after an abort. Without the
        IndexFileDeleter I think they would stick around indefinitely.

        Does this approach sound reasonable/right? Any feedback?

        Show
        Michael McCandless added a comment - I think we should try to make all of the addIndexes calls (and more generally any call to Lucene) "transactional". Meaning, if the call is aborted (machine crashes, disk full, jvm killed, neutrino hits CPU, etc.) then your index just "rolls back" to where it was at the start of the call. Ie, it is consistent and none of the incoming documents were added. This way your index is fine after the crash, and, you can fix the cause of the crash and re-run the addIndexes call and you won't get duplicate documents. To achieve this, each of the three addIndexes methods would need to 1) not commit a new segments file until the end, and 2) not delete any segments referenced by the initial segments file (segmentInfos) until the end. We have three methods now for addIndexes: For addIndexes(IndexReader[]): this method is I think already transactional. We create a merger, add all readers to it, do the merge, and only at the end commit the new segments file & remove old segments. For addIndexes(Directory[]): this method can currently corrupt the index if aborted. However, because all merging is done only on the newly added segments, I think the fix is simply to not commit the new segments file until the end? For addIndexesNoOptimize(Directory[]): this method can also currently corrupt the index if aborted. To fix this I think we need to not only prevent committing a new segments file until the end, but also to prevent deletion of any segments in the original segments file. This is because it's able (I think?) to merge both old and new segments in its step 3. This would normally result in deleting those old segments that were merged. Note that this will increase the temporary disk usage used during the call, because old segments must remain on disk even if they have been merged, but I think this is the right tradeoff (transactional vs temporary disk usage)? Also note that we would need the fixes from lockless LUCENE-701 to properly delete orphan'd segments after an abort. Without the IndexFileDeleter I think they would stick around indefinitely. Does this approach sound reasonable/right? Any feedback?
        Hide
        Ning Li added a comment -

        > I think we should try to make all of the addIndexes calls (and more
        > generally any call to Lucene) "transactional".

        Agree. A transactional semantics would be better.

        The approach you described for three addIndexes looks good.

        addIndexes(IndexReader[]) is transactional but has two commits: one
        when existing segments are merged at the beginning, the other at the
        end when all segment/readers are merged.

        addIndexes(Directory[]) can be fixed to have a similar behaviour:
        first commit when existing segments are merged at the beginning, then
        at the end when all old/new segments are merged.

        addIndexesNoOptimize(Directory[]), on the other hand, does not merge
        existing segments at the beginning. So when fixed, it will only have
        one commit at the end which captures all the changes.

        Show
        Ning Li added a comment - > I think we should try to make all of the addIndexes calls (and more > generally any call to Lucene) "transactional". Agree. A transactional semantics would be better. The approach you described for three addIndexes looks good. addIndexes(IndexReader[]) is transactional but has two commits: one when existing segments are merged at the beginning, the other at the end when all segment/readers are merged. addIndexes(Directory[]) can be fixed to have a similar behaviour: first commit when existing segments are merged at the beginning, then at the end when all old/new segments are merged. addIndexesNoOptimize(Directory[]), on the other hand, does not merge existing segments at the beginning. So when fixed, it will only have one commit at the end which captures all the changes.
        Hide
        Michael McCandless added a comment -

        I've attached a patch with changes as described below. I will commit
        in a few days if no one objects!

        All unit tests pass.

        This patch fixes addIndexes to 1) not corrupt the index on exception,
        2) be transactional (either all or nothing was actually added), and 3)
        to leave the writer instance in an consistent state (meaning, on an
        exception, the segmentInfos state is restored to its starting point,
        so that it matches what's actually in the index, and any unreferenced,
        presumably partially written, files in the index are removed).

        I've also fixed IndexWriter.mergeSegments() and IndexReader.commit()
        to keep the instance "consistent" on exception as well (ie, even when
        not being called from addIndexes). Meaning, on an exception, any
        changes to segmentInfos state are rolled back, any opened readers (in
        the merger) are closed, and non-committed files are removed.

        I've added two unit tests (one for IndexWriter.addIndexes* and one for
        IndexReader.close) that expose these various cases as failures in the
        current Lucene sources, which then pass with this patch.

        These unit tests use a new handy MockRAMDirectory that can simulate
        disk full, inject random IOExceptions, measures peak disk usage, etc.

        Here is the summary of the approach:

        • Added private methods startTransaction(), rollbackTransaction()
          and commitTransaction() to IndexWriter. During a transaction
          (inTransaction = true), I block changes to the index that alter
          how it was at the start of the transaction: we don't write any new
          segments_N files (but, do update the segmentInfos in memory, and
          do write / merge new segment files); we are not allowed to delete
          any segment files that were in the index at the start (added
          "protectedSegments" to track this during a transaction).
        • For the addIndexes methods, I first call startTransaction(), then
          do the actual work in a try block, and in a finally block then
          call either commitTransaction() or rollbackTransaction().
        • Fixed mergeSegments to respect whether it's in a transaction (and
          not commit/delete). Also, fixed this method so that if it's not
          in a transaction (ie, being called by optimize or
          maybeMergeSegments) it still (in a try/finally) leaves
          segmentInfos "consistent" on hitting an exception, and removes any
          partial files that had been written.
        • Fixed IndexReader.commit with similar rollback logic to reset
          internal state if commit has failed.

        It is actually possible to get an IOException from
        addIndexes(Directory[]) yet see that all docs were in fact added. This
        happens when the disk full is hit in building the CFS file on the
        final optimize. In this case, the index is consistent, but that
        segment will remain in non-CFS format and will show all docs as added.

        Various other small changes:

        • Changed RAMDirectory, and its createOutput method, to be
          non-final. Also changed private -> protected for some of its
          instance variables.
        • Created MockRAMDirectory subclass.
        • Moved the RAMDirectory.getRecomputedSizeInBytes() into
          MockRAMDirectory.
        • Changed IndexFileDeleter to use a HashSet (instead of Vector) to
          track pending files since with these changes the same file can be
          added to the pending set many times.
        • Added some javadocs around temporary disk usage by these methods
          (this came up on java-user recently). Also included a check in
          one of the unit tests to assert this.
        • In SegmentInfos, separately track (in memory only) which
          generation we will use for writing vs which generation was last
          successfully read. These are almost always the same, but can
          differ when a commit failed while writing the next segments_N
          file.
        • Changed package protected IndexFileDeleter.commitPendingFiles() to
          actually delete the files (previously you also had to separately
          call deleteFiles).
        • Added SegmentInfos.getNextSegmentFileName()
        • Added SegmentInfos.clone() to also copy the contents of the vector
          (ie, each SegmentInfo).
        • Added SegmentInfo.reset(), which is used to "copy back" a previous
          clone of a SegmentInfo (IndexReader uses this to rollback).
        • Added package protected SegmentReader.getSegmentName()
        • Fixed a small bug in IndexFileDeleter that was failing to remove
          un-referenced CFS file on a segment that wasn't successfully
          converted to a CFS file (and added case to TestIndexFileDelter to
          expose the bug first).
        • Fixed a few minor typos.
        Show
        Michael McCandless added a comment - I've attached a patch with changes as described below. I will commit in a few days if no one objects! All unit tests pass. This patch fixes addIndexes to 1) not corrupt the index on exception, 2) be transactional (either all or nothing was actually added), and 3) to leave the writer instance in an consistent state (meaning, on an exception, the segmentInfos state is restored to its starting point, so that it matches what's actually in the index, and any unreferenced, presumably partially written, files in the index are removed). I've also fixed IndexWriter.mergeSegments() and IndexReader.commit() to keep the instance "consistent" on exception as well (ie, even when not being called from addIndexes). Meaning, on an exception, any changes to segmentInfos state are rolled back, any opened readers (in the merger) are closed, and non-committed files are removed. I've added two unit tests (one for IndexWriter.addIndexes* and one for IndexReader.close) that expose these various cases as failures in the current Lucene sources, which then pass with this patch. These unit tests use a new handy MockRAMDirectory that can simulate disk full, inject random IOExceptions, measures peak disk usage, etc. Here is the summary of the approach: Added private methods startTransaction(), rollbackTransaction() and commitTransaction() to IndexWriter. During a transaction (inTransaction = true), I block changes to the index that alter how it was at the start of the transaction: we don't write any new segments_N files (but, do update the segmentInfos in memory, and do write / merge new segment files); we are not allowed to delete any segment files that were in the index at the start (added "protectedSegments" to track this during a transaction). For the addIndexes methods, I first call startTransaction(), then do the actual work in a try block, and in a finally block then call either commitTransaction() or rollbackTransaction(). Fixed mergeSegments to respect whether it's in a transaction (and not commit/delete). Also, fixed this method so that if it's not in a transaction (ie, being called by optimize or maybeMergeSegments) it still (in a try/finally) leaves segmentInfos "consistent" on hitting an exception, and removes any partial files that had been written. Fixed IndexReader.commit with similar rollback logic to reset internal state if commit has failed. It is actually possible to get an IOException from addIndexes(Directory[]) yet see that all docs were in fact added. This happens when the disk full is hit in building the CFS file on the final optimize. In this case, the index is consistent, but that segment will remain in non-CFS format and will show all docs as added. Various other small changes: Changed RAMDirectory, and its createOutput method, to be non-final. Also changed private -> protected for some of its instance variables. Created MockRAMDirectory subclass. Moved the RAMDirectory.getRecomputedSizeInBytes() into MockRAMDirectory. Changed IndexFileDeleter to use a HashSet (instead of Vector) to track pending files since with these changes the same file can be added to the pending set many times. Added some javadocs around temporary disk usage by these methods (this came up on java-user recently). Also included a check in one of the unit tests to assert this. In SegmentInfos, separately track (in memory only) which generation we will use for writing vs which generation was last successfully read. These are almost always the same, but can differ when a commit failed while writing the next segments_N file. Changed package protected IndexFileDeleter.commitPendingFiles() to actually delete the files (previously you also had to separately call deleteFiles). Added SegmentInfos.getNextSegmentFileName() Added SegmentInfos.clone() to also copy the contents of the vector (ie, each SegmentInfo). Added SegmentInfo.reset(), which is used to "copy back" a previous clone of a SegmentInfo (IndexReader uses this to rollback). Added package protected SegmentReader.getSegmentName() Fixed a small bug in IndexFileDeleter that was failing to remove un-referenced CFS file on a segment that wasn't successfully converted to a CFS file (and added case to TestIndexFileDelter to expose the bug first). Fixed a few minor typos.
        Hide
        Ning Li added a comment -

        It looks good. My two cents:

        1 In the two rollbacks in mergeSegments (where inTransaction is false), the segmentInfos' generation is not always rolled back. So something like this could happen: two consecutive successful commits write segments_3 and segments_5, respectively. Nothing is broken, but it'd be nice to roll back completely (even for the IndexWriter instance) when a commit fails.

        2 Code serving two purposes are (and has been) mixed in mergeSegments: one to merge segments and create compound file if necessary, the other to commit or roll back when inTransaction is false. It'd be nice if the two could be separated: optimize and maybeMergeSegments call mergeSegmentsAndCommit, which creates a transaction, calls mergeSegments and commits or rolls back; mergeSegments doesn't deal with commit or rollback. However, currently the non-CFS version is committed first even if useCompoundFile is true. Until that's changed, mergeSegments probably has to continue serving both purposes.

        Show
        Ning Li added a comment - It looks good. My two cents: 1 In the two rollbacks in mergeSegments (where inTransaction is false), the segmentInfos' generation is not always rolled back. So something like this could happen: two consecutive successful commits write segments_3 and segments_5, respectively. Nothing is broken, but it'd be nice to roll back completely (even for the IndexWriter instance) when a commit fails. 2 Code serving two purposes are (and has been) mixed in mergeSegments: one to merge segments and create compound file if necessary, the other to commit or roll back when inTransaction is false. It'd be nice if the two could be separated: optimize and maybeMergeSegments call mergeSegmentsAndCommit, which creates a transaction, calls mergeSegments and commits or rolls back; mergeSegments doesn't deal with commit or rollback. However, currently the non-CFS version is committed first even if useCompoundFile is true. Until that's changed, mergeSegments probably has to continue serving both purposes.
        Hide
        Michael McCandless added a comment -

        Thanks for the review Ning!

        > 1 In the two rollbacks in mergeSegments (where inTransaction is
        > false), the segmentInfos' generation is not always rolled back. So
        > something like this could happen: two consecutive successful commits
        > write segments_3 and segments_5, respectively. Nothing is broken,
        > but it'd be nice to roll back completely (even for the IndexWriter
        > instance) when a commit fails.

        This is actually intentional: I don't want to write to the same
        segments_N filename, ever, on the possibility that a reader may be
        reading it. Admittedly, this should be quite rare (filling up disk
        and then experiencing contention, only on Windows), but still I wanted
        to keep "write once" even in this case.

        > 2 Code serving two purposes are (and has been) mixed in
        > mergeSegments: one to merge segments and create compound file if
        > necessary, the other to commit or roll back when inTransaction is
        > false. It'd be nice if the two could be separated: optimize and
        > maybeMergeSegments call mergeSegmentsAndCommit, which creates a
        > transaction, calls mergeSegments and commits or rolls back;
        > mergeSegments doesn't deal with commit or rollback. However,
        > currently the non-CFS version is committed first even if
        > useCompoundFile is true. Until that's changed, mergeSegments
        > probably has to continue serving both purposes.

        I agree, mergeSegments is doing two different things now: merging
        segments, and, committing this change (in 2 steps, for the CFS case)
        into the index (if not in a transaction).

        I had in fact tried putting a transaction up at the
        optimize/maybeMergeSegments level, and it worked, but there was one
        severe drawback: the max temp free space required would be [up to] 2X
        the starting size of the segments-to-be-merged because the original
        segments would be there (1X), the newly merged separate segment files
        would be there (another 1X) and the just-created CFS segment file
        would also be there (another 1X).

        Whereas the max temp space required now is 1X the starting size of
        segments-to-be-merged.

        So I think this (doing 2 commits with the current source before my
        patch) is intentional, to keep the temp free space required at 1X.

        It's also [relatively] OK to commit that first time, at least
        functionally: the index is consistent and has all docs. The only
        downside is that if you hit disk full or other exception when creating
        the CFS file then your index has one segment not in compound format (I
        will call this out in the javadocs).

        OK I've added a unit-test that explicitly tests max temp space
        required by just optimize and asserts that it's at most 1X starting
        index size. Will attach a patch shortly!

        Show
        Michael McCandless added a comment - Thanks for the review Ning! > 1 In the two rollbacks in mergeSegments (where inTransaction is > false), the segmentInfos' generation is not always rolled back. So > something like this could happen: two consecutive successful commits > write segments_3 and segments_5, respectively. Nothing is broken, > but it'd be nice to roll back completely (even for the IndexWriter > instance) when a commit fails. This is actually intentional: I don't want to write to the same segments_N filename, ever, on the possibility that a reader may be reading it. Admittedly, this should be quite rare (filling up disk and then experiencing contention, only on Windows), but still I wanted to keep "write once" even in this case. > 2 Code serving two purposes are (and has been) mixed in > mergeSegments: one to merge segments and create compound file if > necessary, the other to commit or roll back when inTransaction is > false. It'd be nice if the two could be separated: optimize and > maybeMergeSegments call mergeSegmentsAndCommit, which creates a > transaction, calls mergeSegments and commits or rolls back; > mergeSegments doesn't deal with commit or rollback. However, > currently the non-CFS version is committed first even if > useCompoundFile is true. Until that's changed, mergeSegments > probably has to continue serving both purposes. I agree, mergeSegments is doing two different things now: merging segments, and, committing this change (in 2 steps, for the CFS case) into the index (if not in a transaction). I had in fact tried putting a transaction up at the optimize/maybeMergeSegments level, and it worked, but there was one severe drawback: the max temp free space required would be [up to] 2X the starting size of the segments-to-be-merged because the original segments would be there (1X), the newly merged separate segment files would be there (another 1X) and the just-created CFS segment file would also be there (another 1X). Whereas the max temp space required now is 1X the starting size of segments-to-be-merged. So I think this (doing 2 commits with the current source before my patch) is intentional, to keep the temp free space required at 1X. It's also [relatively] OK to commit that first time, at least functionally: the index is consistent and has all docs. The only downside is that if you hit disk full or other exception when creating the CFS file then your index has one segment not in compound format (I will call this out in the javadocs). OK I've added a unit-test that explicitly tests max temp space required by just optimize and asserts that it's at most 1X starting index size. Will attach a patch shortly!
        Hide
        Michael McCandless added a comment -

        OK I attached a new patch with changes to only javadocs & unit tests:

        • Fixed the disk full unit test to use "richer" documents so indexes
          shrink less on merging/optimizing (ie make the test case "harder"
          to satisfy the disk usage check).
        • Added new test case for temp disk usage of optimize. Verified
          that it fails if we put a transaction around mergeSegments call in
          optimize (as described above).
        • Fixed javadocs for addIndexes: we actually require up to 2X the
          total input size of all indices. Fixed unit test to assert this.
        • Fixed javadocs in IndexWriter's optimize, addIndexes,
          addDocument to describe disk usage and index state after an
          IOException is thrown.
        • Improved how MockRAMDirectory tracks/enforces max usage.
        • Other small fixes to unit test.
        Show
        Michael McCandless added a comment - OK I attached a new patch with changes to only javadocs & unit tests: Fixed the disk full unit test to use "richer" documents so indexes shrink less on merging/optimizing (ie make the test case "harder" to satisfy the disk usage check). Added new test case for temp disk usage of optimize. Verified that it fails if we put a transaction around mergeSegments call in optimize (as described above). Fixed javadocs for addIndexes : we actually require up to 2X the total input size of all indices. Fixed unit test to assert this. Fixed javadocs in IndexWriter's optimize, addIndexes , addDocument to describe disk usage and index state after an IOException is thrown. Improved how MockRAMDirectory tracks/enforces max usage. Other small fixes to unit test.
        Hide
        Ning Li added a comment -

        > This is actually intentional: I don't want to write to the same
        > segments_N filename, ever, on the possibility that a reader may be
        > reading it. Admittedly, this should be quite rare (filling up disk
        > and then experiencing contention, only on Windows), but still I wanted
        > to keep "write once" even in this case.

        In IndexWriter, the rollbackTransaction call in commitTransaction could
        cause write to the same segment_N filename, right?

        The "write once" semantics is not kept for segment names or .delN. This
        is ok because no reader will read the old versions.

        Show
        Ning Li added a comment - > This is actually intentional: I don't want to write to the same > segments_N filename, ever, on the possibility that a reader may be > reading it. Admittedly, this should be quite rare (filling up disk > and then experiencing contention, only on Windows), but still I wanted > to keep "write once" even in this case. In IndexWriter, the rollbackTransaction call in commitTransaction could cause write to the same segment_N filename, right? The "write once" semantics is not kept for segment names or .delN. This is ok because no reader will read the old versions.
        Hide
        Michael McCandless added a comment -

        > In IndexWriter, the rollbackTransaction call in commitTransaction could
        > cause write to the same segment_N filename, right?

        Good catch – you are correct! OK, I fixed rollbackTransaction to do
        the same in-place rollback of the segmentInfos that I do in
        mergeSegments (patch attached). This means if another attempt is made
        to commit, using this same IndexWriter instance after it had received
        an exception, it will write to a new segments_N file.

        > The "write once" semantics is not kept for segment names or
        > .delN. This is ok because no reader will read the old versions.

        Right, I think these cases are less important because readers would never try to
        open those partially written files (since no segments_N references them).

        Show
        Michael McCandless added a comment - > In IndexWriter, the rollbackTransaction call in commitTransaction could > cause write to the same segment_N filename, right? Good catch – you are correct! OK, I fixed rollbackTransaction to do the same in-place rollback of the segmentInfos that I do in mergeSegments (patch attached). This means if another attempt is made to commit, using this same IndexWriter instance after it had received an exception, it will write to a new segments_N file. > The "write once" semantics is not kept for segment names or > .delN. This is ok because no reader will read the old versions. Right, I think these cases are less important because readers would never try to open those partially written files (since no segments_N references them).
        Hide
        Michael McCandless added a comment -

        Closing all issues that were resolved for 2.1.

        Show
        Michael McCandless added a comment - Closing all issues that were resolved for 2.1.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development