Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4470

Red areas around text when converting a pdf to png with pdfbox

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.13
    • 2.0.14, 3.0.0 PDFBox
    • Rendering
    • None

    Description

      I'm trying to convert a pdf to png file using pdfbox. Unfortunately in the result I get weird red areas in some places of the output. I'm not sure what's the problem. It's a problem with only some of the pdf files.

      Here's some of the code that I'm using:

          public static BufferedImage generateFromPdf(String ref, InputStream stream, int pageIndex, PreviewMode mode) throws IOException {
               PDDocument doc = null;
               try (InputStream buffered = new BufferedInputStream(stream)) {
                   doc = PDDocument.load(buffered, PDF_LOADING_MEMORY_SETTING);
                   if (pageIndex > doc.getNumberOfPages())
      {                 return null;             }
                  PDFRenderer renderer = new PDFRenderer(doc);
                   return rasterizePdfBox(ref, pageIndex, renderer, mode);
               } finally {
                   if (doc != null)
      {                 doc.close();             }
              }
           }
      

      and then:

          private static BufferedImage rasterizePdfBox(String ref, int pageIndex, PDFRenderer renderer, PreviewMode mode) throws IOException {
               Future<BufferedImage> result = executorService.submit(() ->
      {             LOGGER.info(String.format("Generate preview for ref: %s, page: %s, mode: %s ", ref, pageIndex, mode.name()));             return renderer.renderImageWithDPI(pageIndex - 1, mode.getDpi(), ImageType.RGB);         }
      );
              try
      {             return result.get();         }
      catch (InterruptedException | ExecutionException e)
      {             LOGGER.error(String.format("Error when generating preview: %s", e.getMessage()));             Thread.currentThread().interrupt();             throw new IOException(e.getMessage());         }
          }
      

      ```

      So far I've only figured out that the places which are red in the output are blank when I open them in `Master PDF editor` on linux. They seem normal though when I open them with `Document Viewer`.

      Some hints:

      • the pdfs with problems have been scanned. I can select text around the working parts but not at the places that have red overlay over them. Maybe it's something to do with OCR issues?
      • if I use the linux tool `convert not-working-pdf.pdf converted.pdf` and then try to convert this file to png, then the issue is not there anymore.

      Attachments

        Activity

          People

            tilman Tilman Hausherr
            linoor Michał Pomarański
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: