Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1262

IndexOutOfBoundsException from FieldsReader after problem reading the index

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.1
    • Fix Version/s: 2.3.2, 2.4
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      There is a situation where there is an IOException reading from Hits, and then the next time you get a NullPointerException instead of an IOException.

      Example stack traces:

      java.io.IOException: The specified network name is no longer available
      at java.io.RandomAccessFile.readBytes(Native Method)
      at java.io.RandomAccessFile.read(RandomAccessFile.java:322)
      at org.apache.lucene.store.FSIndexInput.readInternal(FSDirectory.java:536)
      at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:74)
      at org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:220)
      at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:93)
      at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
      at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:57)
      at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:88)
      at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344)
      at org.apache.lucene.index.IndexReader.document(IndexReader.java:368)
      at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84)
      at org.apache.lucene.search.Hits.doc(Hits.java:104)

      That error is fine. The problem is the next call to doc generates:

      java.lang.NullPointerException
      at org.apache.lucene.index.FieldsReader.getIndexType(FieldsReader.java:280)
      at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:216)
      at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:101)
      at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:344)
      at org.apache.lucene.index.IndexReader.document(IndexReader.java:368)
      at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:84)
      at org.apache.lucene.search.Hits.doc(Hits.java:104)

      Presumably FieldsReader is caching partially-initialised data somewhere. I would normally expect the exact same IOException to be thrown for subsequent calls to the method.

      1. LUCENE-1262.patch
        6 kB
        Michael McCandless
      2. Test.java
        2 kB
        Trejkaz

        Activity

        Hide
        mikemccand Michael McCandless added a comment -

        Those stack traces look like 2.1 not 2.3.1. Is that right?

        Can you post the index that you are using and the code that results in the 2nd exception? I can't get the 2nd exception to happen in a test case...

        Show
        mikemccand Michael McCandless added a comment - Those stack traces look like 2.1 not 2.3.1. Is that right? Can you post the index that you are using and the code that results in the 2nd exception? I can't get the 2nd exception to happen in a test case...
        Hide
        trejkaz Trejkaz added a comment -

        Whoops. I don't think it's 2.1 but it must be 2.2.

        I'll try and reproduce this standalone but first I need a way to have readInternal throw an exception. I presume you were using some kind of custom store implementation to do that, I'll see if I can make it happen.under 2.2 and then try the same thing under 2.3.1 to confirm whether it still breaks.

        Show
        trejkaz Trejkaz added a comment - Whoops. I don't think it's 2.1 but it must be 2.2. I'll try and reproduce this standalone but first I need a way to have readInternal throw an exception. I presume you were using some kind of custom store implementation to do that, I'll see if I can make it happen.under 2.2 and then try the same thing under 2.3.1 to confirm whether it still breaks.
        Hide
        trejkaz Trejkaz added a comment -

        Okay I'll eat my words now, it is indeed 2.1 as the version doesn't have openInput(String,int) in it.

        Anyway an update: I've managed to reproduce it on any text index by simulating random network outage. I'm keeping a flag which I set to true. The trick is that the wrapping IndexInput implementation randomly throws IOException if the flag is true – if it always throws IOException the problem doesn't occur. If it randomly throws it then it occurs occasionally, and it always seems to be for larger queries (I'm using MatchAllDocsQuery now.)

        I'll see if I can tweak the code to make it more likely to happen and then start working up to each version of Lucene to see if it stops happening somewhere.

        Show
        trejkaz Trejkaz added a comment - Okay I'll eat my words now, it is indeed 2.1 as the version doesn't have openInput(String,int) in it. Anyway an update: I've managed to reproduce it on any text index by simulating random network outage. I'm keeping a flag which I set to true. The trick is that the wrapping IndexInput implementation randomly throws IOException if the flag is true – if it always throws IOException the problem doesn't occur. If it randomly throws it then it occurs occasionally, and it always seems to be for larger queries (I'm using MatchAllDocsQuery now.) I'll see if I can tweak the code to make it more likely to happen and then start working up to each version of Lucene to see if it stops happening somewhere.
        Hide
        trejkaz Trejkaz added a comment -

        I managed to reproduce the problem as-is under version 2.2.

        For 2.3 the problem has changed – instead of a NullPointerException it is now an IndexOutOfBoundsException:

        Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 52, Size: 34
        at java.util.ArrayList.RangeCheck(ArrayList.java:547)
        at java.util.ArrayList.get(ArrayList.java:322)
        at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
        at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:154)
        at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:659)
        at org.apache.lucene.index.IndexReader.document(IndexReader.java:525)
        at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:92)
        at org.apache.lucene.search.Hits.doc(Hits.java:167)
        at Test.main(Test.java:24)

        Will attach my test program in a moment.

        Show
        trejkaz Trejkaz added a comment - I managed to reproduce the problem as-is under version 2.2. For 2.3 the problem has changed – instead of a NullPointerException it is now an IndexOutOfBoundsException: Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 52, Size: 34 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:154) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:659) at org.apache.lucene.index.IndexReader.document(IndexReader.java:525) at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:92) at org.apache.lucene.search.Hits.doc(Hits.java:167) at Test.main(Test.java:24) Will attach my test program in a moment.
        Hide
        trejkaz Trejkaz added a comment -

        Attaching a test program to reproduce the problem under 2.3.1.

        It occurs approximately 1 in every 4 executions for any reasonably large text index (really small ones don't seem to do it so I couldn't attach a text index with it.) The number of fields may be related, looking at the IndexOutOfBoundsException numbers it seems that the indexes we have happen to have a large number of fields.

        Show
        trejkaz Trejkaz added a comment - Attaching a test program to reproduce the problem under 2.3.1. It occurs approximately 1 in every 4 executions for any reasonably large text index (really small ones don't seem to do it so I couldn't attach a text index with it.) The number of fields may be related, looking at the IndexOutOfBoundsException numbers it seems that the indexes we have happen to have a large number of fields.
        Hide
        mikemccand Michael McCandless added a comment -

        OK indeed I can get the failure to happen, using your Test running against a partial Wikipedia index I have. I'll pursue! Thanks Trejkaz.

        Show
        mikemccand Michael McCandless added a comment - OK indeed I can get the failure to happen, using your Test running against a partial Wikipedia index I have. I'll pursue! Thanks Trejkaz.
        Hide
        mikemccand Michael McCandless added a comment -

        Attached patch. All tests pass. I plan to commit in a day or so, to
        both trunk (2.4) and 2.3.X branch (2.3.2).

        I got the failure to happen with a standalone test case, added to
        TestFieldsReader.

        I found & fixed the issue. It's in BufferedIndexReader's refill()
        method. The problem is that method changes bufferLength even if an
        exception is hit. This leaves incorrect bytes in the buffer such that
        a subsequent readByte will return the incorrect bytes.

        The fix is simple: use a local "int newLength" and only assign that to
        value to bufferLength if the readInternal() call succeeds. The test
        fails without the fix and passes with it.

        Show
        mikemccand Michael McCandless added a comment - Attached patch. All tests pass. I plan to commit in a day or so, to both trunk (2.4) and 2.3.X branch (2.3.2). I got the failure to happen with a standalone test case, added to TestFieldsReader. I found & fixed the issue. It's in BufferedIndexReader's refill() method. The problem is that method changes bufferLength even if an exception is hit. This leaves incorrect bytes in the buffer such that a subsequent readByte will return the incorrect bytes. The fix is simple: use a local "int newLength" and only assign that to value to bufferLength if the readInternal() call succeeds. The test fails without the fix and passes with it.
        Hide
        mikemccand Michael McCandless added a comment -

        I just committed this. Thanks Trejkaz!

        Show
        mikemccand Michael McCandless added a comment - I just committed this. Thanks Trejkaz!

          People

          • Assignee:
            Unassigned
            Reporter:
            trejkaz Trejkaz
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development