Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1072

NullPointerException during indexing in DocumentsWriter$ThreadState$FieldData.addPosition

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3
    • Fix Version/s: 2.3
    • Component/s: core/index
    • Labels:
      None
    • Environment:

      Linux CentOS 5 x86_64 running on 2-core Pentium D, Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_01-b06, mixed mode), using lucene-core-2007-11-29_02-49-31

    • Lucene Fields:
      New

      Description

      In my case during indexing sometimes appear documents with unusually large "words" - text-encoded images in fact.
      Attempt to add document that contains field with such token produces java.lang.IllegalArgumentException:
      java.lang.IllegalArgumentException: term length 37944 exceeds max term length 16383
      at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.addPosition(DocumentsWriter.java:1492)
      at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(DocumentsWriter.java:1321)
      at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1247)
      at org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:972)
      at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2202)
      at org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2186)
      at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1432)
      at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1411)

      This is expected, exception is caught and ignored. The problem is that after this IndexWriter becomes somewhat corrupted and subsequent attempts to add documents to the index fail as well, this time with NPE:
      java.lang.NullPointerException
      at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.addPosition(DocumentsWriter.java:1497)
      at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(DocumentsWriter.java:1321)
      at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1247)
      at org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:972)
      at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2202)
      at org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2186)
      at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1432)
      at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1411)

      This is 100% reproducible.

        Attachments

        1. LUCENE-1072.take2.patch
          11 kB
          Michael McCandless
        2. LUCENE-1072.patch
          10 kB
          Michael McCandless

          Activity

            People

            • Assignee:
              mikemccand Michael McCandless
              Reporter:
              dets Alexei Dets
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: