Description
Rights Management Service (RMS), implemented in MS Office as Information Rights Management (IRM), allows organizations to set file permissions that are stored within the file. In most cases, this will result in the file getting a new extension (with a prefix p, such as .txt becoming .ptxt), but in the case of MS Office and PDF files, which support this natively, the implementation results in the file contents being encrypted without any extension change.
Current behavior
Running such files through Tika produces results as if it was an empty file ran through DefaultParser and OfficeParser.
Expected behavior
Extract more metadata about necessary permissions to view (if possible), and throwing EncryptedDocumentException as is the case with Office files encrypted in the more traditional manner.
Attachments
Attachments
Issue Links
- is related to
-
TIKA-4082 Extraction from Microsoft Sharepoint protected PDFs doesn't expose exception like other parsers.
- Resolved
- links to