Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-12985

ClassNotFound indexing crypted documents

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Won't Fix
    • 7.3.1
    • None
    • None

    Description

      When indexing a BLOB containing an encrypted Office Document (xls or xlsx but I think all types) it fail with a very bad exception, if the document is not encrypted works fine.

      I'm using the DataImportHandler.

      The exception seems also avoid the onError=skip or continue, making the import fail.

      I tried to move the libraries from contrib/extraction/lib/ to server/lib and the unfounded class changes, so it's a class loading issue.

      This is the base exception:

      Exception while processing: document_index document : SolrInputDocument(fields: [site=187, index_type=document, resource_id=3, title_full=Dati cliente.docx, id=d-XXX-3, publish_date=2018-09-28 00:00:00.0, abstract= Azioni di recupero intraprese sulle Fatture telefoniche, insert_date=2019-09-28 00:00:00.0, type=Documenti, url=http://]):org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 1
          at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
          at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:171)
          at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
          at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
          at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
          at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
          at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:364)
          at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
          at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:452)
          at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:485)
          at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
          at java.lang.Thread.run(Thread.java:748)
      Caused by: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.microsoft.OfficeParser@500efcf1
          at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
          at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
          at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
          at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165)
          ... 10 more
      Caused by: java.io.IOException: java.lang.ClassNotFoundException: org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder
          at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:150)
          at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102)
          at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203)
          at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
          at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
          ... 13 more
      Caused by: java.lang.ClassNotFoundException: org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder
          at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
          at org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:565)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
          at org.apache.poi.poifs.crypt.EncryptionInfo.getBuilder(EncryptionInfo.java:222)
          at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:148)
          ... 17 more

      Attachments

        1. notcrypted.docx
          11 kB
          Luca
        2. crypted.xlsx
          16 kB
          Luca
        3. schema.zip
          19 kB
          Luca
        4. logs.zip
          30 kB
          Luca
        5. db.sql
          21 kB
          Luca

        Issue Links

          Activity

            People

              Unassigned Unassigned
              lucaver Luca
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: