Description
When we parse XPS files using the AutoParser we always get an empty string.
If we use DefaultDetector.detect() it correctly detects the MediaType as "application/vnd.ms-xpsdocument".
This page
https://tika.apache.org/1.16/formats.html
suggests that XPS (application/vnd.ms-xpsdocument) is supported however.
Our code:
InputStream bis = this.getClass().getResourceAsStream("/" + EXPECTED_LOCATION + "doc_xps.xps");
Metadata metadata = new Metadata();
BodyContentHandler handler = new BodyContentHandler();
AutoDetectParser parser = new AutoDetectParser();
TikaInputStream tikaStream = TikaInputStream.get(bis);
parser.parse(tikaStream, handler, metadata);
String parsedText = handler.toString();
I will attach doc_xps.xps if I can