Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-7703

Lucene IndexWriter Creation Failure

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.12.0
    • lucene

    Description

      While computing the index repository, the initialization might fail if there are modifications happening to the fileAndChunk region while the IndexWriter is being initialized.
      The exception stack trace varies from run to run but it always involves a IOException with different causes while reading the index file, some examples are shown below:

      Caused by: java.io.FileNotFoundException: segments_1
      	at org.apache.geode.cache.lucene.internal.filesystem.FileSystem.getFile(FileSystem.java:101)
      	at org.apache.geode.cache.lucene.internal.directory.RegionDirectory.openInput(RegionDirectory.java:115)
      	at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:137)
      	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:286)
      	at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:165)
      	at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:974)
      	at org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.finishComputingRepository(IndexRepositoryFactory.java:130)
      	at org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.computeIndexRepository(IndexRepositoryFactory.java:67)
      	at org.apache.geode.cache.lucene.internal.IndexRepositoryFactoryDistributedTest.lambda$testBecomePrimaryWhileIndexing$566b4a0f$5(IndexRepositoryFactoryDistributedTest.java:224)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123)
      	at org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:78)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
      	at sun.rmi.transport.Transport$1.run(Transport.java:200)
      	at sun.rmi.transport.Transport$1.run(Transport.java:197)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
      	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
      	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
      	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      
      Caused by: java.io.EOFException: Read past end of file _3z.si
      	at org.apache.geode.cache.lucene.internal.directory.FileIndexInput.readByte(FileIndexInput.java:103)
      	at org.apache.lucene.store.BufferedChecksumIndexInput.readByte(BufferedChecksumIndexInput.java:41)
      	at org.apache.lucene.store.DataInput.readInt(DataInput.java:101)
      	at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:194)
      	at org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:255)
      	at org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:93)
      	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
      	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
      	... 20 more
      	Suppressed: org.apache.lucene.index.CorruptIndexException: checksum status indeterminate: unexpected exception (resource=BufferedChecksumIndexInput(_3z.si))
      		at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:471)
      		at org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:252)
      		... 22 more
      	Caused by: java.io.EOFException: Read past end of file _3z.si
      		at org.apache.geode.cache.lucene.internal.directory.FileIndexInput.readBytes(FileIndexInput.java:124)
      		at org.apache.lucene.store.BufferedChecksumIndexInput.readBytes(BufferedChecksumIndexInput.java:49)
      		at org.apache.lucene.store.DataInput.readBytes(DataInput.java:87)
      		at org.apache.lucene.store.DataInput.skipBytes(DataInput.java:350)
      		at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:458)
      		... 23 more
      

      The issue itself is extremely hard to reproduce as the time window for the race to happen is rather small, the solution implies returning null from the IndexRepositoryFactory whenever the exception happens and let the caller retry (the internal logic for doing this is already in place).

      Attachments

        Issue Links

          Activity

            People

              echobravo Ernest Burghardt
              jjramos Juan Ramos
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m