Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4477

Large encrypted file takes days to be parsed

    XMLWordPrintableJSON

Details

    Description

      As reported by slavago inĀ TIKA-2832. File is confidential but I have it. Initial findings:

      • File is AES256 encrypted with empty user password
      • File has about 1000 objects
      • File is a tagged PDF
      • HashMap in SecurityHandler grows to 100000?!
      • Using an IdentityHashMap speeds up the process dramatically (parsed in a few seconds), and it may also be a better solution that what was done in PDFBOX-4453

      Todo:

      • Read description of IdentityHashMap again
      • Find out why the HashMap grows so much. Could it be that identical objects are stored twice? Or does the file have many direct objects?

      Attachments

        Issue Links

          Activity

            People

              tilman Tilman Hausherr
              tilman Tilman Hausherr
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: