Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3246

IllegalArgumentException when generation of appearances fails

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.25
    • Fix Version/s: 2.0.0, 1.26
    • Component/s: parser
    • Labels:
      None

      Description

      java.lang.IllegalArgumentException: No glyph for U+0041 (A) in font BZZZZZ+Aladin-Regular
      	at org.apache.pdfbox.pdmodel.font.PDCIDFontType2.encode(PDCIDFontType2.java:372)
      	at org.apache.pdfbox.pdmodel.font.PDType0Font.encode(PDType0Font.java:422)
      	at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:332)
      	at org.apache.pdfbox.pdmodel.font.PDFont.getStringWidth(PDFont.java:363)
      	at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.calculateFontSize(AppearanceGeneratorHelper.java:859)
      	at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.insertGeneratedAppearance(AppearanceGeneratorHelper.java:494)
      	at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceContent(AppearanceGeneratorHelper.java:422)
      	at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceValue(AppearanceGeneratorHelper.java:232)
      	at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:264)
      	at org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.refreshAppearances(PDAcroForm.java:327)
      	at org.apache.pdfbox.pdmodel.fixup.processor.AcroFormGenerateAppearancesProcessor.process(AcroFormGenerateAppearancesProcessor.java:54)
      	at org.apache.pdfbox.pdmodel.fixup.AcroFormDefaultFixup.apply(AcroFormDefaultFixup.java:56)
      	at org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAcroForm(PDDocumentCatalog.java:132)
      	at org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAcroForm(PDDocumentCatalog.java:113)
      	at org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:267)
      

      This is related to a change in PDFBox in PDDocumentCatalog.getAcroForm(), we try to "fix" fields when they exist as annotations but not as fields. I wonder if this is needed at all.

      It happens with several files, among them the two AML files of PDFBOX-4086.

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              tilman Tilman Hausherr
              Reporter:
              tilman Tilman Hausherr

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment