Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-959

Text extraction slow and /tmp fills upwith AWT font files

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.4.0
    • Fix Version/s: 1.6.0
    • Component/s: Text extraction
    • Labels:
      None

      Description

      During text extraction there is NO need to create AWT fonts.
      However the current Type1C Font code creates the AWT always while initializing.

      This has several really bad side effects:
      1. Wasted time creating the AWT font.
      2. The font files are copied into /tmp which fills up after a few thousand text extractions.
      3. The AWT is created in a synchronized region so is single threaded.

      The patch is quite simple. Just delay creation of the AWT fint until required.

        Attachments

          Activity

            People

            • Assignee:
              lehmi Andreas Lehmkühler
              Reporter:
              kevinjackson Kevin Jackson
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: