Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-846

IOExeception can cause loss of data due to premature segment deletion

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.2
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      If you hit an IOException, e.g., disk full, while making a cfs from its constituent parts, you may not be able to rollback to the before-merge process. This happens via addIndexes.

      I don't have a nice easy test for this; generating IOEs ain't so easy. But it does happen in the patch for the factored merge policy with the existing tests because the pseudo-randomly generated IOEs fall in a different place.

      1. LUCENE-846-test.txt
        100 kB
        Steven Parkes

        Activity

        Hide
        mikemccand Michael McCandless added a comment -

        Hmmm, this does not sound good! I'll apply the patch and track it down.

        Show
        mikemccand Michael McCandless added a comment - Hmmm, this does not sound good! I'll apply the patch and track it down.
        Hide
        mikemccand Michael McCandless added a comment -

        Steve, I applied the patch from LUCENE-847 and all tests ran successfully. Do I need to do something else to see this issue happen?

        Show
        mikemccand Michael McCandless added a comment - Steve, I applied the patch from LUCENE-847 and all tests ran successfully. Do I need to do something else to see this issue happen?
        Hide
        steven_parkes Steven Parkes added a comment -

        Ahh ... meant to mention that. I disabled TestIndexWriter#testAddIndexOnDiskFull in the patch by prefixing the name with an underscore. Remove it to reenable the test.

        Show
        steven_parkes Steven Parkes added a comment - Ahh ... meant to mention that. I disabled TestIndexWriter#testAddIndexOnDiskFull in the patch by prefixing the name with an underscore. Remove it to reenable the test.
        Hide
        mikemccand Michael McCandless added a comment -

        OK, thanks.

        Hurm, I found the _, removed it, but the test still ran successfully. Now I'm getting nervous!

        Show
        mikemccand Michael McCandless added a comment - OK, thanks. Hurm, I found the _, removed it, but the test still ran successfully. Now I'm getting nervous!
        Hide
        steven_parkes Steven Parkes added a comment -

        Dang. It used to fail but I've made some updates since then that would change the merge policy (fancy that) and it's no longer happening.

        Let me see if I have a snapshot lying around that has the old code where it still happens.

        [As as aside, I've been thinking of looking into svk or brz (or whatever its called) that would allow me to version control stuff without it going into svn. All of a sudden I can see why the linux folks like distributed version control ...]

        Show
        steven_parkes Steven Parkes added a comment - Dang. It used to fail but I've made some updates since then that would change the merge policy (fancy that) and it's no longer happening. Let me see if I have a snapshot lying around that has the old code where it still happens. [As as aside, I've been thinking of looking into svk or brz (or whatever its called) that would allow me to version control stuff without it going into svn. All of a sudden I can see why the linux folks like distributed version control ...]
        Hide
        steven_parkes Steven Parkes added a comment -

        Okay, here's an old version of 847 that demonstrates the problem (at least it does for me.)

        No looking at the rest of my half-baked code.

        And there's a bunch of debugging turned on in the patch, too.

        Show
        steven_parkes Steven Parkes added a comment - Okay, here's an old version of 847 that demonstrates the problem (at least it does for me.) No looking at the rest of my half-baked code. And there's a bunch of debugging turned on in the patch, too.
        Hide
        mikemccand Michael McCandless added a comment -

        Excellent I can repro the problem as well, thanks Steven!

        OK I see what's going on here: if you open a writer with
        autoCommit=false, and you call addIndexes, and you hit an exception
        (eg disk full) during the addIndexes, the addIndexes call tries to
        rollback to the index state when it started but in doing so restores
        the writer's SegmentInfos instance to where it was when addIndexes
        started.

        The problem is when autoCommit is false those segments have often
        already been replaced/merged/etc. and are now gone.

        The good news is this bug was born with the fix for LUCENE-710, so, it
        has not been released yet.

        Also good news is this is straightforward to fix: the addIndexes calls
        just need to incRef this SegmentInfos' files at the start (and then
        decRef once done) to protect them from deletion.

        Show
        mikemccand Michael McCandless added a comment - Excellent I can repro the problem as well, thanks Steven! OK I see what's going on here: if you open a writer with autoCommit=false, and you call addIndexes , and you hit an exception (eg disk full) during the addIndexes, the addIndexes call tries to rollback to the index state when it started but in doing so restores the writer's SegmentInfos instance to where it was when addIndexes started. The problem is when autoCommit is false those segments have often already been replaced/merged/etc. and are now gone. The good news is this bug was born with the fix for LUCENE-710 , so, it has not been released yet. Also good news is this is straightforward to fix: the addIndexes calls just need to incRef this SegmentInfos' files at the start (and then decRef once done) to protect them from deletion.

          People

          • Assignee:
            mikemccand Michael McCandless
            Reporter:
            steven_parkes Steven Parkes
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development