Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4477

Large encrypted file takes days to be parsed

    XMLWordPrintableJSON

    Details

      Description

      As reported by Slava G inĀ TIKA-2832. File is confidential but I have it. Initial findings:

      • File is AES256 encrypted with empty user password
      • File has about 1000 objects
      • File is a tagged PDF
      • HashMap in SecurityHandler grows to 100000?!
      • Using an IdentityHashMap speeds up the process dramatically (parsed in a few seconds), and it may also be a better solution that what was done in PDFBOX-4453

      Todo:

      • Read description of IdentityHashMap again
      • Find out why the HashMap grows so much. Could it be that identical objects are stored twice? Or does the file have many direct objects?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tilman Tilman Hausherr
                Reporter:
                tilman Tilman Hausherr
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: