Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
8.5
-
None
-
None
-
New
Description
Failed segment merge caused by "No space left on device" can't be recovered and Lucene fails with CorruptIndexException after restart. The expectation is that Lucene will be able to restart automatically without manual intervention.
We have 2 indexing patterns:
- Create and commit an empty index, then start long initial indexing process (might take hours), perform a second commit in the end
- Using existing index, add no more than 4k documents and commit after that
Right now we don't have evidence to suggest which pattern caused this issue, but we definitely witnessed a similar situation for the second pattern, although it was a bit different - caused by OutOfMemoryError: Java Heap Space, with missing _q.cfe file which produced only NoSuchFileException, not CorruptIndexException. Please let me know if we need a separate ticket for that.
Lucene version: 8.5.0
Java version: OpenJDK 11
OS: CentOS Linux 7
Kernel: Linux 3.10.0-1160.11.1.el7.x86_64
Virtualization: kvm
Filesystem: xfs
Failed merge stacktrace:
2021-02-02T08:51:51.679+0000 org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No space left on device at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:704) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684) Caused by: java.io.IOException: No space left on device at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method) at java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62) at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113) at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79) at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280) at java.base/java.nio.channels.Channels.writeFullyImpl(Channels.java:74) at java.base/java.nio.channels.Channels.writeFully(Channels.java:97) at java.base/java.nio.channels.Channels$1.write(Channels.java:172) at org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:416) at java.base/java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:74) at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81) at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127) at org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53) at org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:73) at org.apache.lucene.util.compress.LZ4.encodeLiterals(LZ4.java:159) at org.apache.lucene.util.compress.LZ4.encodeSequence(LZ4.java:172) at org.apache.lucene.util.compress.LZ4.compress(LZ4.java:441) at org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:165) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:229) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:159) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:636) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:229) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4463) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4057) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:625) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:662)
Followed by failed startup:
2021-02-02T08:52:07.926+0000 org.apache.lucene.index.CorruptIndexException: Unexpected file read error while reading index. (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/data/5f91aa0b07ce4d5e7beffaa2/segments_578fu"))) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:291) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:846) Caused by: java.nio.file.NoSuchFileException: /data/5f91aa0b07ce4d5e7beffaa2/_6lfem.si at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) at java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182) at java.base/java.nio.channels.FileChannel.open(FileChannel.java:292) at java.base/java.nio.channels.FileChannel.open(FileChannel.java:345) at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81) at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157) at org.apache.lucene.codecs.lucene70.Lucene70SegmentInfoFormat.read(Lucene70SegmentInfoFormat.java:91) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:353) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289) ... 33 common frames omitted