Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3481

Localization in XRef generation results in unusable PDFs

    XMLWordPrintableJSON

Details

    • Patch

    Description

      PDFBox appears to be using a localized number formatter when encoding the XRef table. Depending on locale settings, this can result in unicode characters being used, which prevents PDFBox from loading the PDF.

      The following code demonstrates this:

      import java.io.File;
      import java.io.FileInputStream;
      import java.io.FileOutputStream;
      import java.util.Locale;
      import org.apache.pdfbox.pdmodel.PDDocument;
      import org.apache.pdfbox.pdmodel.PDPage;
      import org.apache.pdfbox.pdmodel.common.PDRectangle;
      
      class Example {
      
        public static void main(String [] args) throws Exception {
          File tempFile = File.createTempFile("example", ".pdf");
          Locale arabicLocale = new Locale.Builder().setLanguageTag("ar-EG-u-nu-arab").build();
          Locale.setDefault(arabicLocale);
      
          try (FileOutputStream out = new FileOutputStream(tempFile)) {
            PDDocument doc = new PDDocument();
            doc.addPage(new PDPage(PDRectangle.LETTER));
      
            doc.save(out);
            doc.close();
          }
      
          try (FileInputStream in = new FileInputStream(tempFile)) {
            PDDocument doc = PDDocument.load(in);
            // This will throw.
            doc.getPage(0);
          }
        }
      }
      

      Attachments

        1. xref_format.patch
          2 kB
          Edward Kupershlak

        Activity

          People

            tilman Tilman Hausherr
            ekupershlak@google.com Edward Kupershlak
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: