Description
After upgrade to 1.5 (from 1.4) it seems that the detector does not properly support all kinds of InputStream like it used to.
I get tons of:
org.apache.tika.io.TaggedIOException: mark/reset not supported at org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133) at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:170) at org.apache.tika.io.TikaInputStream.reset(TikaInputStream.java:673) at org.apache.tika.mime.MimeTypes.detect(MimeTypes.java:474) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:115) at org.apache.tika.Tika.parseToString(Tika.java:527) at org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getContentAsText(AbstractSolrMetadataExtractor.java:509) at org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:111) at org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:93) at org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:133) at org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:504) at org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:411) at org.xwiki.search.solr.internal.DefaultSolrIndexer.run(DefaultSolrIndexer.java:377) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: mark/reset not supported at java.io.InputStream.reset(InputStream.java:348) at org.apache.commons.io.input.ProxyInputStream.reset(ProxyInputStream.java:169) at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:168) ... 13 common frames omitted
This regression makes tika unusable for us.
Attachments
Issue Links
- is caused by
-
IO-568 AutoCloseInputStream sometimes breaks mark/reset contract
-
- Open
-