Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2395

The parser does not support AutoCloseInputStream anymore

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Won't Fix
    • 1.15
    • None
    • detector, parser
    • None

    Description

      After upgrade to 1.5 (from 1.4) it seems that the detector does not properly support all kinds of InputStream like it used to.

      I get tons of:

      org.apache.tika.io.TaggedIOException: mark/reset not supported
      	at org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133)
      	at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:170)
      	at org.apache.tika.io.TikaInputStream.reset(TikaInputStream.java:673)
      	at org.apache.tika.mime.MimeTypes.detect(MimeTypes.java:474)
      	at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77)
      	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:115)
      	at org.apache.tika.Tika.parseToString(Tika.java:527)
      	at org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getContentAsText(AbstractSolrMetadataExtractor.java:509)
      	at org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:111)
      	at org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:93)
      	at org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:133)
      	at org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:504)
      	at org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:411)
      	at org.xwiki.search.solr.internal.DefaultSolrIndexer.run(DefaultSolrIndexer.java:377)
      	at java.lang.Thread.run(Thread.java:748)
      Caused by: java.io.IOException: mark/reset not supported
      	at java.io.InputStream.reset(InputStream.java:348)
      	at org.apache.commons.io.input.ProxyInputStream.reset(ProxyInputStream.java:169)
      	at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:168)
      	... 13 common frames omitted
      

      This regression makes tika unusable for us.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tmortagne Thomas Mortagne
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: