Lucene - Core
  1. Lucene - Core
  2. LUCENE-2593

disk full can cause index corruption in certain cases

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.4, 3.0.3, 3.1, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Robert uncovered this nasty bug, in adding more randomness to
      oal.index tests...

      I got a standalone test to show the issue; the corruption path is
      as follows:

      • The merge hits an initial exception (eg disk full when merging the
        postings).
      • In handling this exception, IW closes all the sub-readers,
        suppressing any further exceptions.
      • If one of these sub-readers has pending deletions, which happens
        if readers are pooled in IW, it will flush them. If that flush
        hits a 2nd exception (eg disk full), then SegmentReader
        [incorrectly] leaves the SegmentInfo's delGen advanced by 1,
        referencing a corrupt file, yet the SegmentReader is still
        forcefully closed.
      • If enough disk frees up such that a later IW.commit/close
        succeeds, the resulting segments file will reference an invalid
        deletions file.
      1. LUCENE-2593.patch
        17 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Attached patch; I ended up making a number of defensive fixes on how
        IW/DW/SR handle exceptions:

        • Generally I moved the error recover down lower, eg SegmentReader
          now restores its SegmentInfo and deletes the partially written
          file, on hitting an exception writing changed norms or deletes.
        • IW's ReaderPool no longer forcefully drops changes if it hits an
          exception committing an SR. The SR now remains pooled, holding
          onto its changes, in case a future commit is attempted and the SR
          is able to commit.
        • We checkpoint with IndexFileDeleter more "finely" now, so that as
          soon as a new file is referenced (eg from writing deletes), IFD
          knows about it. This prevents incorrect deletion of a file eg if
          a merge IncRefs and then DecRefs before we can checkpoint.

        I believe this issues goes back to 2.9, when we added reader pooling
        (for NRT).

        Show
        Michael McCandless added a comment - Attached patch; I ended up making a number of defensive fixes on how IW/DW/SR handle exceptions: Generally I moved the error recover down lower, eg SegmentReader now restores its SegmentInfo and deletes the partially written file, on hitting an exception writing changed norms or deletes. IW's ReaderPool no longer forcefully drops changes if it hits an exception committing an SR. The SR now remains pooled, holding onto its changes, in case a future commit is attempted and the SR is able to commit. We checkpoint with IndexFileDeleter more "finely" now, so that as soon as a new file is referenced (eg from writing deletes), IFD knows about it. This prevents incorrect deletion of a file eg if a merge IncRefs and then DecRefs before we can checkpoint. I believe this issues goes back to 2.9, when we added reader pooling (for NRT).
        Hide
        Earwin Burrfoot added a comment -

        Yeehaw! This looks very much like a bug I was experiencing, which we earlier attributed to me fiddling with fsync.

        Show
        Earwin Burrfoot added a comment - Yeehaw! This looks very much like a bug I was experiencing, which we earlier attributed to me fiddling with fsync.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development