Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.14
Description
As reported by slavago inĀ TIKA-2832. File is confidential but I have it. Initial findings:
- File is AES256 encrypted with empty user password
- File has about 1000 objects
- File is a tagged PDF
- HashMap in SecurityHandler grows to 100000?!
- Using an IdentityHashMap speeds up the process dramatically (parsed in a few seconds), and it may also be a better solution that what was done in
PDFBOX-4453
Todo:
- Read description of IdentityHashMap again
- Find out why the HashMap grows so much. Could it be that identical objects are stored twice? Or does the file have many direct objects?
Attachments
Issue Links
- relates to
-
TIKA-2832 Very slow large PDF text extraction
- Open
-
PDFBOX-4453 Encrypted string not decrypted
- Closed