Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
ManifoldCF 2.17
-
None
Description
When crawling some sites (for instance this one: http://www.antibes-juanlespins.com/ ) the job manages to index some documents, but the stops with the following error code:
Error: IO error: utf-8; filename=rseventspro_rss20_56.xml
Here is one the MCF stacktrace:
Exception tossed: IO error: utf-8; filename=rseventspro_rss20_56.xml
org.apache.manifoldcf.core.interfaces.ManifoldCFException: IO error: utf-8; filename=rseventspro_rss20_56.xml
at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4203) ~[?:?]
at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:3855) ~[?:?]
at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:746) ~[?:?]
at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]
Caused by: java.io.UnsupportedEncodingException: utf-8; filename=rseventspro_rss20_56.xml
at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71) ~[?:1.8.0_212]
at java.io.InputStreamReader.<init>(InputStreamReader.java:100) ~[?:1.8.0_212]
at org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.dealWithBytes(DecodingByteReceiver.java:47) ~[?:?]
at org.apache.manifoldcf.connectorcommon.fuzzyml.BOMEncodingDetector.dealWithRemainder(BOMEncodingDetector.java:250) ~[?:?]
at org.apache.manifoldcf.connectorcommon.fuzzyml.SingleByteReceiver.dealWithBytes(SingleByteReceiver.java:52) ~[?:?]
at org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithCharsetDetection(Parser.java:74) ~[?:?]
at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4174) ~[?:?]
... 3 more