Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9867

CorruptIndexException after failed segment merge caused by No space left on device

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 8.5
    • None
    • core/store
    • None
    • New

    Description

      Failed segment merge caused by "No space left on device" can't be recovered and Lucene fails with CorruptIndexException after restart. The expectation is that Lucene will be able to restart automatically without manual intervention.

      We have 2 indexing patterns:

      • Create and commit an empty index, then start long initial indexing process (might take hours), perform a second commit in the end
      • Using existing index, add no more than 4k documents and commit after that

      Right now we don't have evidence to suggest which pattern caused this issue, but we definitely witnessed a similar situation for the second pattern, although it was a bit different - caused by OutOfMemoryError: Java Heap Space, with missing _q.cfe file which produced only NoSuchFileException, not CorruptIndexException. Please let me know if we need a separate ticket for that.

      Lucene version: 8.5.0
      Java version: OpenJDK 11

      OS: CentOS Linux 7
      Kernel: Linux 3.10.0-1160.11.1.el7.x86_64
      Virtualization: kvm
      Filesystem: xfs

      Failed merge stacktrace:

      2021-02-02T08:51:51.679+0000
      org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No space left on device
      	at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704)
      	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
      Caused by: java.io.IOException: No space left on device
      	at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
      	at java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62)
      	at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
      	at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
      	at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280)
      	at java.base/java.nio.channels.Channels.writeFullyImpl(Channels.java:74)
      	at java.base/java.nio.channels.Channels.writeFully(Channels.java:97)
      	at java.base/java.nio.channels.Channels$1.write(Channels.java:172)
      	at org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:416)
      	at java.base/java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:74)
      	at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
      	at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127)
      	at org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53)
      	at org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:73)
      	at org.apache.lucene.util.compress.LZ4.encodeLiterals(LZ4.java:159)
      	at org.apache.lucene.util.compress.LZ4.encodeSequence(LZ4.java:172)
      	at org.apache.lucene.util.compress.LZ4.compress(LZ4.java:441)
      	at org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:165)
      	at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:229)
      	at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:159)
      	at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:636)
      	at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:229)
      	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106)
      	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4463)
      	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4057)
      	at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625)
      	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662)
      

       Followed by failed startup:

      2021-02-02T08:52:07.926+0000
      org.apache.lucene.index.CorruptIndexException: Unexpected file read error while reading index. (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/data/5f91aa0b07ce4d5e7beffaa2/segments_578fu")))
      	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:291)
      	at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:846)
      Caused by: java.nio.file.NoSuchFileException: /data/5f91aa0b07ce4d5e7beffaa2/_6lfem.si
      	at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
      	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
      	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
      	at java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182)
      	at java.base/java.nio.channels.FileChannel.open(FileChannel.java:292)
      	at java.base/java.nio.channels.FileChannel.open(FileChannel.java:345)
      	at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
      	at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157)
      	at org.apache.lucene.codecs.lucene70.Lucene70SegmentInfoFormat.read(Lucene70SegmentInfoFormat.java:91)
      	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:353)
      	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
      	... 33 common frames omitted
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            sqshq Alexander Lukyanchikov
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: