Details
Description
Hello,
I am working with pdfbox to transform input pdf files to images, which are later fed to an OCR library. It works perfectly in most of the cases but I stumbled upon this particular case in which all text disappeared from the rendered image.
My source code for the method which converts the pdf into images:
public List<BufferedImage> splitPdf(File pdfFile) throws IOException { List<BufferedImage> result = new ArrayList<>(); PDDocument document = PDDocument.load(pdfFile); PDFRenderer pdfRenderer = new PDFRenderer(document); for (int pageIndex = 0; pageIndex < document.getNumberOfPages(); pageIndex++) { result.add(pdfRenderer.renderImage(pageIndex)); debugPageImageInfo(result.get(result.size() - 1)); } document.close(); return result; }
I attached to this issue the pdf file for which I identified the problem and the resulting images.
I hope this is helpful for anyone else encountering the same problem!