Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-5815

Can't split the document into individual pages

    XMLWordPrintableJSON

Details

    Description

      If I try to split a document, containing links to internal pages, by single page, Splitter class throws NPE.

       

      This is our code:

       

      PDDocument pdfDocument = Loader.loadPDF(new File("path/to/file.pdf"));
      List<PDDocument> splitted = splitter.split(pdfDocument); 

       

      This the exception:

       

      java.lang.NullPointerException: Cannot invoke "org.apache.pdfbox.pdmodel.PDPage.getCOSObject()" because the return value of "org.apache.pdfbox.pdmodel.interactive.documentnavigation.destination.PDPageDestination.getPage()" is null
          at org.apache.pdfbox.multipdf.Splitter.fixDestinations(Splitter.java:153)
          at org.apache.pdfbox.multipdf.Splitter.split(Splitter.java:136)

       

      I search for the error and i see that it breaks in splitter class in 
      fixDestinations method.
       
      I report here the method definition:

      private void fixDestinations(PDDocument destinationDocument)
      {
          PDPageTree pageTree = destinationDocument.getPages();
          for (PDPageDestination pageDestination : destToFixSet)
          {
              COSDictionary srcPageDict = pageDestination.getPage().getCOSObject();
              COSDictionary dstPageDict = pageDictMap.get(srcPageDict);
              PDPage dstPage = new PDPage(dstPageDict);
              // Find whether destination is inside or outside
              if (pageTree.indexOf(dstPage) >= 0)
              {
                  pageDestination.setPage(dstPage);
              }
              else
              {
                  pageDestination.setPage(null);
              }
          }
      } 

      What's the problem:

      pageDestination.getPage() returns null because the document contains links to internal pages, so splitting by page there is no more valid page to link in the result splitted document.

       

      Possible solution:

      check the page returned and if null set pageDestination to null, I could suggest something like this:

       

      private void fixDestinations(PDDocument destinationDocument)
      {
          PDPageTree pageTree = destinationDocument.getPages();
          for (PDPageDestination pageDestination : destToFixSet)
          {
              PDPage srcPage = pageDestination.getPage();
              if (srcPage != null){
                  COSDictionary srcPageDict = srcPage.getCOSObject();
                  COSDictionary dstPageDict = pageDictMap.get(srcPageDict);
                  PDPage dstPage = new PDPage(dstPageDict);
                  // Find whether destination is inside or outside
                  if (pageTree.indexOf(dstPage) >= 0)
                  {
                      pageDestination.setPage(dstPage);
                  }
                  else
                  {
                      pageDestination.setPage(null);
                  }
              }
              else
              {
                  pageDestination.setPage(null);
              }
          }
      } 

       

      I've attached example file, thanks.

       

      Attachments

        1. CTU.pdf
          905 kB
          Nicolò Rossi

        Issue Links

          Activity

            People

              Unassigned Unassigned
              nikox.96r Nicolò Rossi
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 2h
                  2h
                  Remaining:
                  Remaining Estimate - 2h
                  2h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified