Description
See the following command line extraction
lmcgibbn@LMC-056430 /usr/local/any23(master) $ ./cli/target/appassembler/bin/any23 rover -l output.log -o extraction.json https://www.jobcluster.de ------------------------------------------------------------------------ Apache Any23 :: rover ------------------------------------------------------------------------ 0 [main] WARN org.apache.tika.parser.image.ImageParser - JBIG2ImageReader not loaded. jbig2 files will be ignored 128 [main] INFO org.apache.any23.rdf.PopularPrefixes - Loading prefixes from /org/apache/any23/prefixes/prefixes.properties 1388 [main] WARN org.apache.commons.httpclient.HttpMethodBase - Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended. 4790 [main] INFO org.apache.any23.extractor.SingleDocumentExtraction - Processing https://www.jobcluster.de/ [Fatal Error] :12:46: The entity name must immediately follow the '&' in the entity reference. ------------------------------------------------------------------------ Apache Any23 FAILURE Execution terminated with errors: Error while parsing RDF document. Total time: 5s Finished at: Tue Dec 12 08:01:14 PST 2017 Final Memory: 31M/184M ------------------------------------------------------------------------
This results in the attached extraction result (extraction.json) and associated log (output.log)
If I attempt to run the same extraction using the service at any23.org the (partial) extraction result should be returned regardless of whether the entire extraction was successful or not.
The service servlet seems to be returning the extraction Exception as oppose to the preferred extraction result. This issue will fix that.
Attachments
Attachments
Issue Links
- links to