Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-1522

Some PDF files are causing exception (java.io.IOException: Error: Could not find font(COSName{F53.0}) in map=)

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.7.1
    • Fix Version/s: 1.8.0
    • Component/s: Utilities
    • Labels:
      None
    • Environment:
      RHEL 6

      Description

      I am using PDFBox 1.7.1 and when parsing some PDF files, it is throwing exceptions and it's filling the Tomcat log very quickly (100MB in few seconds). There was another bug filed related to this issue. I tried the patch supplied in that bug but the issue is still there. I want to mention that the text gets extracted successfully from the PDF. But it just throws a log of WARN messages in the logs. As a workaround, I have set the LOG level to ERROR to avoid those WARN messages.

      Here is the problematic PDF file:
      http://doratst.uark.edu/fedora/repository/default%3A1590/OBJ/Traveler20120822.pdf

      Related bug:
      https://issues.apache.org/jira/browse/PDFBOX-1359#comment-13584669

      I am getting the following exception:

      WARN 2013-02-22 14:41:19,519 (PDFStreamEngine) java.lang.NullPointerException
      java.lang.NullPointerException
      WARN 2013-02-22 14:41:19,519 (PDFStreamEngine) java.lang.NullPointerException
      java.lang.NullPointerException
      WARN 2013-02-22 14:41:19,519 (PDFStreamEngine) java.io.IOException: Error: Could not find font(COSName

      {F53.0}) in map={F50.1=org.apache.pdfbox.pdmodel.font.PDType1Font@50246923, F51.0=org.apache.pdfbox.pdmodel.font.PDType1Font@672a1f0}
      java.io.IOException: Error: Could not find font(COSName{F53.0}

      ) in map=

      {F50.1=org.apache.pdfbox.pdmodel.font.PDType1Font@50246923, F51.0=org.apache.pdfbox.pdmodel.font.PDType1Font@672a1f0}

      at org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:57)
      at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
      at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
      at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
      at org.apache.pdfbox.util.operator.Invoke.process(Invoke.java:67)
      at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
      at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
      at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
      at org.apache.pdfbox.util.operator.Invoke.process(Invoke.java:67)
      at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
      at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
      at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
      at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:217)
      at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:448)
      at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:372)
      at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:328)
      at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:247)
      at dk.defxws.fedoragsearch.server.TransformerToText.getTextFromPDF(TransformerToText.java:335)
      at dk.defxws.fedoragsearch.server.TransformerToText.getText(TransformerToText.java:194)
      at dk.defxws.fedoragsearch.server.GenericOperationsImpl.getDatastreamText(GenericOperationsImpl.java:668)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.xalan.extensions.ExtensionHandlerJavaClass.callFunction(ExtensionHandlerJavaClass.java:399)
      at org.apache.xalan.extensions.ExtensionHandlerJavaClass.callFunction(ExtensionHandlerJavaClass.java:438)
      at org.apache.xalan.extensions.ExtensionsTable.extFunction(ExtensionsTable.java:220)
      at org.apache.xalan.transformer.TransformerImpl.extFunction(TransformerImpl.java:473)
      at org.apache.xpath.functions.FuncExtFunction.execute(FuncExtFunction.java:206)
      at org.apache.xpath.Expression.executeCharsToContentHandler(Expression.java:311)

        Attachments

          Activity

            People

            • Assignee:
              lehmi Andreas Lehmkühler
              Reporter:
              dukuonline Diwakar Timilsina
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: