Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4894

Invalid file offsets for PDF files larger than 2G

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.20
    • Fix Version/s: 2.0.21, 3.0.0 PDFBox
    • Component/s: Parsing
    • Labels:
      None
    • Environment:
      Linux

      Description

      An integer is being used to calculate file offsets for COS objects. This works fine for small PDF files, but breaks when the PDF file is larger than 2G. For many large files (136 out of 216 in my sample set), negative file offsets are generated for some of the COS objects due to integer overflow. This results in an IOException being thrown in COSParser.java at line 728. Note that these negative offsets are not valid object stream references.

      I have fixed the problem in my local copy of the code by modifying PDFXrefStreamParser.java starting at line 158.

      Current code:

      int offset = 0;
       for(int i = 0; i < w1; i++)
      
      {   offset += (currLine[i + w0] & 0x00ff) << ((w1 - i - 1) * 8); }
      

      New code:

      long offset = 0;
       for(int i = 0; i < w1; i++)
      
      {   offset += ((long)(currLine[i + w0] & 0x00ff)) << ((w1 - i - 1) * 8); }
      

      I can submit a sample PDF file if desired (it will be more than 2G in size)

        Attachments

          Activity

            People

            • Assignee:
              lehmi Andreas Lehmkühler
              Reporter:
              cgrundstrom Carl Grundstrom
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: