Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9022

solr unable to handle/parse images when they are embedded in office docs(like word,xls,etc)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 5.4.1
    • None
    • SolrJ
    • None

    Description

      As we are trying to index multiple files, the solr throws below exception whenever it encounters embedded images with other docs.

      The issues arises as embedded images files are read with MIME type of binary(attachment_mimetype=[application/octet-stream]) though the attached files are type png/txt etc.

      Full stack trace for this issue

      2016-04-20 16:55:13,311 INFO [Thread-52] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Added attempt_1460872551948_58233_m_000116_1 to list of failed maps
      2016-04-20 16:55:13,329 INFO [IPC Server handler 18 on 39376] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1460872551948_58233_m_000159_0 is : 0.0
      2016-04-20 16:55:13,342 ERROR [IPC Server handler 20 on 39376] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1460872551948_58233_m_000159_0 - exited : org.kitesdk.morphline.api.MorphlineRuntimeException: org.kitesdk.morphline.api.MorphlineRuntimeException: tryRules command found no successful rule for record: {_attachment_body=[TikaInputStream of java.io.BufferedInputStream@391f1777], _attachment_mimetype=[application/octet-stream], _attachment_name=[xl/media/image2.png], id=[185be63a-e527-4953-9a3d-ae957dc0fa51]}
      at org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73)
      at org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:220)
      at org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:86)
      at org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:54)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
      at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
      at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
      Caused by: org.kitesdk.morphline.api.MorphlineRuntimeException: tryRules command found no successful rule for record: {_attachment_body=[TikaInputStream of java.io.BufferedInputStream@391f1777], _attachment_mimetype=[application/octet-stream], _attachment_name=[xl/media/image2.png], id=[185be63a-e527-4953-9a3d-ae957dc0fa51]}
      at org.kitesdk.morphline.stdlib.TryRulesBuilder$TryRules.doProcess(TryRulesBuilder.java:132)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
      at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
      at org.kitesdk.morphline.tika.DetectMimeTypeBuilder$DetectMimeType.doProcess(DetectMimeTypeBuilder.java:166)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
      at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
      at org.kitesdk.morphline.stdlib.SeparateAttachmentsBuilder$SeparateAttachments.doProcess(SeparateAttachmentsBuilder.java:79)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
      at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
      at org.kitesdk.morphline.stdlib.GenerateUUIDBuilder$GenerateUUID.doProcess(GenerateUUIDBuilder.java:98)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
      at org.kitesdk.morphline.tika.decompress.EmbeddedExtractor.parseEmbedded(EmbeddedExtractor.java:57)
      at org.kitesdk.morphline.tika.decompress.UnpackBuilder$Unpack.parseEntry(UnpackBuilder.java:138)
      at org.kitesdk.morphline.tika.decompress.UnpackBuilder$Unpack.doProcess(UnpackBuilder.java:113)
      at org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:96)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
      at org.kitesdk.morphline.stdlib.LogDebugBuilder$LogDebug.doProcess(LogDebugBuilder.java:58)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.stdlib.TryRulesBuilder$TryRules.doProcess(TryRulesBuilder.java:115)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
      at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
      at org.kitesdk.morphline.tika.DetectMimeTypeBuilder$DetectMimeType.doProcess(DetectMimeTypeBuilder.java:166)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
      at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
      at org.kitesdk.morphline.stdlib.SeparateAttachmentsBuilder$SeparateAttachments.doProcess(SeparateAttachmentsBuilder.java:79)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:208)
      ... 10 more

      2016-04-20 16:55:13,343 INFO [IPC Server handler 20 on 39376] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1460872551948_58233_m_000159_0: Error: org.kitesdk.morphline.api.MorphlineRuntimeException: org.kitesdk.morphline.api.MorphlineRuntimeException: tryRules command found no successful rule for record: {_attachment_body=[TikaInputStream of java.io.BufferedInputStream@391f1777], _attachment_mimetype=[application/octet-stream], _attachment_name=[xl/media/image2.png], id=[185be63a-e527-4953-9a3d-ae957dc0fa51]}
      at org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73)
      at org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:220)
      at org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:86)
      at org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:54)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
      at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
      at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
      Caused by: org.kitesdk.morphline.api.MorphlineRuntimeException: tryRules command found no successful rule for record: {_attachment_body=[TikaInputStream of java.io.BufferedInputStream@391f1777], _attachment_mimetype=[application/octet-stream], _attachment_name=[xl/media/image2.png], id=[185be63a-e527-4953-9a3d-ae957dc0fa51]}
      at org.kitesdk.morphline.stdlib.TryRulesBuilder$TryRules.doProcess(TryRulesBuilder.java:132)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
      at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
      at org.kitesdk.morphline.tika.DetectMimeTypeBuilder$DetectMimeType.doProcess(DetectMimeTypeBuilder.java:166)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
      at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
      at org.kitesdk.morphline.stdlib.SeparateAttachmentsBuilder$SeparateAttachments.doProcess(SeparateAttachmentsBuilder.java:79)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
      at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
      at org.kitesdk.morphline.stdlib.GenerateUUIDBuilder$GenerateUUID.doProcess(GenerateUUIDBuilder.java:98)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
      at org.kitesdk.morphline.tika.decompress.EmbeddedExtractor.parseEmbedded(EmbeddedExtractor.java:57)
      at org.kitesdk.morphline.tika.decompress.UnpackBuilder$Unpack.parseEntry(UnpackBuilder.java:138)
      at org.kitesdk.morphline.tika.decompress.UnpackBuilder$Unpack.doProcess(UnpackBuilder.java:113)
      at org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:96)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
      at org.kitesdk.morphline.stdlib.LogDebugBuilder$LogDebug.doProcess(LogDebugBuilder.java:58)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.stdlib.TryRulesBuilder$TryRules.doProcess(TryRulesBuilder.java:115)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
      at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
      at org.kitesdk.morphline.tika.DetectMimeTypeBuilder$DetectMimeType.doProcess(DetectMimeTypeBuilder.java:166)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
      at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
      at org.kitesdk.morphline.stdlib.SeparateAttachmentsBuilder$SeparateAttachments.doProcess(SeparateAttachmentsBuilder.java:79)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
      at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
      at org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:208)
      ... 10 more

      Attachments

        Activity

          People

            Unassigned Unassigned
            bprakashp1 Bhanuprakash Prathap
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: