Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-959

Text extraction slow and /tmp fills upwith AWT font files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.4.0
    • 1.6.0
    • Text extraction
    • None

    Description

      During text extraction there is NO need to create AWT fonts.
      However the current Type1C Font code creates the AWT always while initializing.

      This has several really bad side effects:
      1. Wasted time creating the AWT font.
      2. The font files are copied into /tmp which fills up after a few thousand text extractions.
      3. The AWT is created in a synchronized region so is single threaded.

      The patch is quite simple. Just delay creation of the AWT fint until required.

      Attachments

        1. PDType1CFont.java.patch
          1.0 kB
          Kevin Jackson

        Activity

          People

            lehmi Andreas Lehmkühler
            kevinjackson Kevin Jackson
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: