Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4635

ArrayIndexOutOfBoundsException when a segment has many, many terms

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6.2
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Spinoff from Tom Burton-West's java-user thread "CheckIndex ArrayIndexOutOfBounds error for merged index" ( http://markmail.org/message/fatijkotwucn7hvu ).

      I modified Test2BTerms to instead generate a little over 10B terms, ran it (took 17 hours and created a 162 GB index) and hit a similar exception:

      Time: 62,164.058
      There was 1 failure:
      1) test2BTerms(org.apache.lucene.index.Test2BTerms)
      java.lang.ArrayIndexOutOfBoundsException: 1246
      	at org.apache.lucene.index.TermInfosReaderIndex.compareField(TermInfosReaderIndex.java:249)
      	at org.apache.lucene.index.TermInfosReaderIndex.compareTo(TermInfosReaderIndex.java:225)
      	at org.apache.lucene.index.TermInfosReaderIndex.getIndexOffset(TermInfosReaderIndex.java:156)
      	at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:232)
      	at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:172)
      	at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:539)
      	at org.apache.lucene.search.TermQuery$TermWeight$1.add(TermQuery.java:56)
      	at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:81)
      	at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:87)
      	at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:70)
      	at org.apache.lucene.search.TermQuery$TermWeight.<init>(TermQuery.java:53)
      	at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:199)
      	at org.apache.lucene.search.Searcher.createNormalizedWeight(Searcher.java:168)
      	at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:664)
      	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:342)
      	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:330)
      	at org.apache.lucene.index.Test2BTerms.testSavedTerms(Test2BTerms.java:205)
      	at org.apache.lucene.index.Test2BTerms.test2BTerms(Test2BTerms.java:154)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      

      The index actually succeeded building and optimizing, but it was only when we went to run searches of the random terms we collected along the way that the AIOOBE was hit.

      I suspect this is a bug somewhere in the compact in-RAM terms index ... I'll dig.

        Attachments

        1. LUCENE-4635.patch
          0.5 kB
          Michael McCandless
        2. LUCENE-4635.patch
          4 kB
          Michael McCandless

          Activity

            People

            • Assignee:
              mikemccand Michael McCandless
              Reporter:
              mikemccand Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: