Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-5097

Rendered pdf image lacks all the text in this particular case

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Not A Bug
    • Affects Version/s: 2.0.22
    • Fix Version/s: None
    • Component/s: Rendering
    • Labels:
    • Environment:
      Linux DamianPad 5.4.0-65-generic #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

      Description

      Hello,

      I am working with pdfbox to transform input pdf files to images, which are later fed to an OCR library. It works perfectly in most of the cases but I stumbled upon this particular case in which all text disappeared from the rendered image.

      My source code for the method which converts the pdf into images:

       

      public List<BufferedImage> splitPdf(File pdfFile) throws IOException {
          List<BufferedImage> result = new ArrayList<>();
      
          PDDocument document = PDDocument.load(pdfFile);
          PDFRenderer pdfRenderer = new PDFRenderer(document);
          for (int pageIndex = 0; pageIndex < document.getNumberOfPages(); pageIndex++) {
              result.add(pdfRenderer.renderImage(pageIndex));
              debugPageImageInfo(result.get(result.size() - 1));
          }
          document.close();
      
          return result;
      }
      

       

      I attached to this issue the pdf file for which I identified the problem and the resulting images.

       

      I hope this is helpful for anyone else encountering the same problem!

       

        Attachments

        1. 0.png
          30 kB
          Robert-Andrei Damian
        2. 1.png
          26 kB
          Robert-Andrei Damian
        3. document(3).pdf
          140 kB
          Robert-Andrei Damian

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              damianr13 Robert-Andrei Damian
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: