Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4847

[PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

    XMLWordPrintableJSON

Details

    • Patch

    Description

      This patch was primary thought to add access to raw image data (i.e. without any kind of color conversion/reduction). While implementing and testing it I also found a bug with ICC profile embedding in the PNGConverter.

      This patch does those things:

      • add a method getRawRaster() to PDImage. This allows to read the original raster data in 8 or 16 bit without any kind of color interpretation. The user must know what he wants to do with this himself (E.g. to access the raw data of DeviceN images).
      • add a method getRawImage(). Tries to return the raster obtained by getRawRaster() as a BufferedImage. This is only successful if there is a matching java ColorSpace for the colorspace of the image. I.e. only for ICCBased images. In theory this also should work for PDIndexed sRGB images. But I have to find a PDF with such an image first to test it.
      • add a -noColorConversion switch to the ExtractImage utility to extract images in their original colorspace. For CMYK images this only works when a TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
      • add support to export PNGs with ICC profile data in ImageIOUtil.
      • fix a bug in PNGConverter which does not correctly embed the ICC profile from the png file.
      • the PNGConverterTest tests the raw images; While reading PNG files to compare it also ensures that the embedded ICC profile is correctly respected. The default PNG reader at least till JDK11 does not respect the embedded ICC profile. I.e. the colors are wrong. But there is a workaround for this in the PNGConverterTest (which I have in production for years now). See the screenshot for the correct color display of the png_rgb_romm_16.png testfile (left side; macOS Preview app) and the wrong display (right side; Java; inside IDEA).

       

      Access to the raw image allows beside finding bugs like in the PNGConverter it also to do all kind of funny color things. E.g. a future patch could be to allow using the raw images to print PDFs. If the PDF you want to print has images with a gamut > sRGB (i.e. all modern cameras) and the target printer has also a gamut > sRGB (i.e. some ink photo printer) you will for sure see a difference in the resulting print. Such a mode would be rather slow, as the current sRGB image handling is optimized for speed and using the original raw images would need on demand color conversions in the printer driver. But you get „high quality“ out of it (at least in respect to colors).

      I don’t think this is in time for the 2.0.20 release.

      Attachments

        1. 2.0-extractimage-raw.patch
          8 kB
          Emmeran Seehuber
        2. 2.0-raster-image_v2.patch
          23 kB
          Emmeran Seehuber
        3. 2.0-raw-raster.patch
          11 kB
          Emmeran Seehuber
        4. 2.0-raw-raster-v2.patch
          10 kB
          Emmeran Seehuber
        5. color_difference.png
          337 kB
          Emmeran Seehuber
        6. pdfbox-image-compare.patch
          10 kB
          Emmeran Seehuber
        7. pdfbox-rawimages.patch
          44 kB
          Emmeran Seehuber
        8. png-compress-icc-profile.patch
          2 kB
          Emmeran Seehuber

        Activity

          People

            tilman Tilman Hausherr
            rototor Emmeran Seehuber
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: