Lucene - Core
  1. Lucene - Core
  2. LUCENE-2327

IndexOutOfBoundsException in FieldInfos.java

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Invalid
    • Affects Version/s: 3.0.1
    • Fix Version/s: None
    • Component/s: core/index
    • Environment:

      Fedora 12

    • Lucene Fields:
      New

      Description

      When retrieving the scoreDocs from a multisearcher, the following exception is thrown:

      java.lang.IndexOutOfBoundsException: Index: 52, Size: 4
      at java.util.ArrayList.rangeCheck(ArrayList.java:571)
      at java.util.ArrayList.get(ArrayList.java:349)
      at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:285)
      at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:274)
      at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:86)
      at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:131)
      at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:162)
      at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:232)
      at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:179)
      at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:911)
      at org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:644)

      The error is caused when the fieldNumber passed to FieldInfos.fieldInfo() is greater than the size of array list containing the FieldInfo values. I am not sure what the field number represents or why it would be larger than the array list's size. The quick fix would be to validate the bounds but there may be a bigger underlying problem. The issue does appear to be directly related to LUCENE-939. I've only been able to duplicate this in my production environment and so can't give a good test case.

        Activity

        Hide
        Trejkaz added a comment -

        I have an almost identical stack trace from v3.6, but I did get the index from someone else so I don't know where they were storing it.

        java.lang.IndexOutOfBoundsException: Index: 100, Size: 64
          at java.util.ArrayList.rangeCheck(ArrayList.java:635)
          at java.util.ArrayList.get(ArrayList.java:411)
          at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:255)
          at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:244)
          at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:86)
          at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:133)
          at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:174)
          at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202)
          at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:172)
          at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:539)
          at org.apache.lucene.search.TermQuery$TermWeight$1.add(TermQuery.java:56)
          at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:81)
          at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:87)
          at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:70)
          at org.apache.lucene.search.TermQuery$TermWeight.<init>(TermQuery.java:53)
          at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:199)
          at org.apache.lucene.search.BooleanQuery$BooleanWeight.<init>(BooleanQuery.java:176)
          at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:354)
          at org.apache.lucene.search.Searcher.createNormalizedWeight(Searcher.java:168)
          at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:664)
          at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:364)
        
        Show
        Trejkaz added a comment - I have an almost identical stack trace from v3.6, but I did get the index from someone else so I don't know where they were storing it. java.lang.IndexOutOfBoundsException: Index: 100, Size: 64 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:255) at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:244) at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:86) at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:133) at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:174) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:172) at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:539) at org.apache.lucene.search.TermQuery$TermWeight$1.add(TermQuery.java:56) at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:81) at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:87) at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:70) at org.apache.lucene.search.TermQuery$TermWeight.<init>(TermQuery.java:53) at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:199) at org.apache.lucene.search.BooleanQuery$BooleanWeight.<init>(BooleanQuery.java:176) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:354) at org.apache.lucene.search.Searcher.createNormalizedWeight(Searcher.java:168) at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:664) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:364)
        Hide
        Michael McCandless added a comment -

        OK I'm resolving as optimistically invalid

        Show
        Michael McCandless added a comment - OK I'm resolving as optimistically invalid
        Hide
        Shane added a comment -

        I believe at the time we were storing on a NAS via NFS. If my memory serves me well, there were known issues with running Lucene over NFS at the time. We were experiencing issues with the file system at the time so have since moved to a different architecture.

        Also, I was aware that the fix drops the segments, but thanks anyway.

        Show
        Shane added a comment - I believe at the time we were storing on a NAS via NFS. If my memory serves me well, there were known issues with running Lucene over NFS at the time. We were experiencing issues with the file system at the time so have since moved to a different architecture. Also, I was aware that the fix drops the segments, but thanks anyway.
        Hide
        Michael McCandless added a comment -

        Yikes – you had 10 corrupted segments (of 23) and there's at least 4 different flavors of corruption across those segments! Curious... What storage device did you store the index on?

        Note the that "fix" just drops those segments from the index, so any docs that were in them are lost.

        Show
        Michael McCandless added a comment - Yikes – you had 10 corrupted segments (of 23) and there's at least 4 different flavors of corruption across those segments! Curious... What storage device did you store the index on? Note the that "fix" just drops those segments from the index, so any docs that were in them are lost.
        Hide
        Shane added a comment -

        The index is relatively old and doesn't appear to have been modified for a number of years. I can't say for certain about prior exceptions. If the CheckIndex results provides any more details, then great. Regardless, I'm willing to chalk this up to a system specific error and close the ticket. I was able to fix the index using Luke.

        Show
        Shane added a comment - The index is relatively old and doesn't appear to have been modified for a number of years. I can't say for certain about prior exceptions. If the CheckIndex results provides any more details, then great. Regardless, I'm willing to chalk this up to a system specific error and close the ticket. I was able to fix the index using Luke.
        Hide
        Shane added a comment -

        CheckIndex output generated by Luke v1.0.0.

        Show
        Shane added a comment - CheckIndex output generated by Luke v1.0.0.
        Hide
        Michael McCandless added a comment -

        This exception looks like index corruption... would be good to get to the root cause of how this happened.

        Your terms dict, which records the field number and character data for each term, has somehow recorded a field number of 52 when in fact this segment appears to only have 4 fields.

        Can you run CheckIndex on the index and post the result back?

        Any prior exceptions when creating this index?

        I don't think adding a bounds check to FieldInfos makes sense – the best we could do is throw a "FieldNumberOutOfBounds" exception.

        Show
        Michael McCandless added a comment - This exception looks like index corruption... would be good to get to the root cause of how this happened. Your terms dict, which records the field number and character data for each term, has somehow recorded a field number of 52 when in fact this segment appears to only have 4 fields. Can you run CheckIndex on the index and post the result back? Any prior exceptions when creating this index? I don't think adding a bounds check to FieldInfos makes sense – the best we could do is throw a "FieldNumberOutOfBounds" exception.

          People

          • Assignee:
            Unassigned
            Reporter:
            Shane
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development