Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2395

The parser does not support AutoCloseInputStream anymore

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Won't Fix
    • Affects Version/s: 1.15
    • Fix Version/s: None
    • Component/s: detector, parser
    • Labels:
      None

      Description

      After upgrade to 1.5 (from 1.4) it seems that the detector does not properly support all kinds of InputStream like it used to.

      I get tons of:

      org.apache.tika.io.TaggedIOException: mark/reset not supported
      	at org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133)
      	at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:170)
      	at org.apache.tika.io.TikaInputStream.reset(TikaInputStream.java:673)
      	at org.apache.tika.mime.MimeTypes.detect(MimeTypes.java:474)
      	at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77)
      	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:115)
      	at org.apache.tika.Tika.parseToString(Tika.java:527)
      	at org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getContentAsText(AbstractSolrMetadataExtractor.java:509)
      	at org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:111)
      	at org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:93)
      	at org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:133)
      	at org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:504)
      	at org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:411)
      	at org.xwiki.search.solr.internal.DefaultSolrIndexer.run(DefaultSolrIndexer.java:377)
      	at java.lang.Thread.run(Thread.java:748)
      Caused by: java.io.IOException: mark/reset not supported
      	at java.io.InputStream.reset(InputStream.java:348)
      	at org.apache.commons.io.input.ProxyInputStream.reset(ProxyInputStream.java:169)
      	at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:168)
      	... 13 common frames omitted
      

      This regression makes tika unusable for us.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                tmortagne Thomas Mortagne
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: