Description
Currently, Tika handles custom metadata from Open Document files. Any custom metadata is returned with a custom: prefix (see OpenOfficeParserTest#testOO2Metadata for example)
Microsoft file formats don't include custom metadata in the parsing, and nor does PDF
Assuming we're happy with including custom metadata from Documents in the parsing step, with the custom: prefix, I'll go ahead and add it for the Microsoft (ole2 and ooxml) and PDF parsers
Attachments
Issue Links
- relates to
-
TIKA-438 Parse and return the complete set of custom document properties from MS Office documents
- Closed