PDFBox
  1. PDFBox
  2. PDFBOX-1185

A problem with indexed color spaces: bpc of the base color space seems wrong.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.7.0
    • Fix Version/s: 1.7.0
    • Component/s: PDModel
    • Labels:
      None

      Description

      I incorporated the "proper" solution to PDFBOX-1075 into my regression tests and the file which made me raise that issue got broken again. It has pictures with indexed color spaces, which now are returned as black. The indexed color space is one-bit. The lookup table has two colors: black and white. With the current trunk, the black pixels remain black (0,0,0), but the white pixels are returned as (1,1,1), which in RGB is nearly just as black. The text on the picture is obviously unreadable.

      On a second look it seems to me that the offending line is here (in PDPixelMap.getRGBImage):

      ColorModel baseColorModel = csIndexed.getBaseColorSpace().createColorModel(bpc);
      

      I think it's wrong, as in an indexed color space, "bpc" is not "bits per component", but "bits per index" i.e. bits per the integer which is interpreted as the index to the color lookup array. I think that the base color space's color model should be initialized with a different number. I came up with a following calculation:

      PDIndexed csIndexed = (PDIndexed)colorspace;
      PDColorSpace baseCs = csIndexed.getBaseColorSpace();
      int numberOfColorValues = 1 << bpc;
      int highValue = csIndexed.getHighValue();
      int size = Math.min(numberOfColorValues-1, highValue);
      byte[] index = csIndexed.getLookupData();
      int parentComponentsCount = baseCs.getNumberOfComponents();
      int baseColorModelBPC = (index.length * 8) / ((size+1) * parentComponentsCount);
      ColorModel baseColorModel = csIndexed.getBaseColorSpace().createColorModel(baseColorModelBPC);
      

      The baseColorModelBPC is calculated as the total number of bits in the lookup array divided by the total number of components in all colors. This seems to work for my offending file and causes no regressions with files from PDFBOX-1075, PDFBOX-1010 and PDFBOX-706.

      What is weird though is a line which is later:

      byte[] inData = new byte[baseColorModel.getNumComponents()];
      

      Doesn't this effectively assume that baseColorModelBPC should always be 8? If the base color model bpc were anything else than 8, then this code wouldn't be able to handle it correctly anyway. Or am I overlooking something?

      I'll attach a patch which works for me. Note that simply changing

      ColorModel baseColorModel = csIndexed.getBaseColorSpace().createColorModel(bpc);
      

      into

      ColorModel baseColorModel = csIndexed.getBaseColorSpace().createColorModel(8);
      

      has exactly the same effect (fixed my problem, no regressions in those three earlier issues). Please decide what makes more sense.

      1. pdfbox-parentcsbpc.patch
        5 kB
        Antoni Mylka
      2. page11.png
        75 kB
        Antoni Mylka

        Activity

        Hide
        Antoni Mylka added a comment -

        Attached a patch. Simply putting 8 instead of bpc in that method seems to work just as well. Please decide.

        Show
        Antoni Mylka added a comment - Attached a patch. Simply putting 8 instead of bpc in that method seems to work just as well. Please decide.
        Hide
        Andreas Lehmkühler added a comment -

        Can you please attach the pdf in question?

        Show
        Andreas Lehmkühler added a comment - Can you please attach the pdf in question?
        Hide
        Antoni Mylka added a comment -

        Sorry I can't, it's confidential. I'll try to find a public one.

        Show
        Antoni Mylka added a comment - Sorry I can't, it's confidential. I'll try to find a public one.
        Hide
        Andreas Lehmkühler added a comment -

        Maybe it's enough to have a look at the contents of the XObject dictionary? Can you attach a screenshot (using PDFDebugger) or code-snippet (using WriteDecodeDoc) of it to this issue, please?

        Show
        Andreas Lehmkühler added a comment - Maybe it's enough to have a look at the contents of the XObject dictionary? Can you attach a screenshot (using PDFDebugger) or code-snippet (using WriteDecodeDoc) of it to this issue, please?
        Hide
        Antoni Mylka added a comment -

        The file in question has 25 pages. It contains some images. The images I mean are on page 11. They are all parts of a scanned page with some text. In Adobe Reader they look like a single image of the entire page.

        The image I attach comes from iText RUPS. The object tree starting at page11.

        The output from WriteDecodedDoc looks like this:

        116 0 obj
        <<
        /Type /XObject
        /Subtype /Image
        /Name /im3
        /Width 1910
        /Height 505
        /BitsPerComponent 1
        /ColorSpace [/Indexed /DeviceRGB 1 166 0 R]
        /Length 167 0 R
        >>
        stream
        ˙˙˙
        endstream
        endobj
        117 0 obj
        <<
        /Type /XObject
        /Subtype /Image
        /Name /im4
        /Width 1910
        /Height 505
        /BitsPerComponent 1
        /ColorSpace [/Indexed /DeviceRGB 1 168 0 R]
        /Length 169 0 R
        >>
        stream
        ˙˙˙
        endstream
        endobj
        118 0 obj
        <<
        /Type /XObject
        /Subtype /Image
        /Name /im5
        /Width 1910
        /Height 505
        /BitsPerComponent 1
        /ColorSpace [/Indexed /DeviceRGB 1 170 0 R]
        /Length 171 0 R
        >>
        stream
        ˙˙˙
        endstream
        endobj
        119 0 obj
        <<
        /Type /XObject
        /Subtype /Image
        /Name /im6
        /Width 1910
        /Height 502
        /BitsPerComponent 1
        /ColorSpace [/Indexed /DeviceRGB 1 172 0 R]
        /Length 173 0 R
        >>
        stream
        ˙˙˙
        endstream
        endobj

        Show
        Antoni Mylka added a comment - The file in question has 25 pages. It contains some images. The images I mean are on page 11. They are all parts of a scanned page with some text. In Adobe Reader they look like a single image of the entire page. The image I attach comes from iText RUPS. The object tree starting at page11. The output from WriteDecodedDoc looks like this: 116 0 obj << /Type /XObject /Subtype /Image /Name /im3 /Width 1910 /Height 505 /BitsPerComponent 1 /ColorSpace [/Indexed /DeviceRGB 1 166 0 R] /Length 167 0 R >> stream ˙˙˙ endstream endobj 117 0 obj << /Type /XObject /Subtype /Image /Name /im4 /Width 1910 /Height 505 /BitsPerComponent 1 /ColorSpace [/Indexed /DeviceRGB 1 168 0 R] /Length 169 0 R >> stream ˙˙˙ endstream endobj 118 0 obj << /Type /XObject /Subtype /Image /Name /im5 /Width 1910 /Height 505 /BitsPerComponent 1 /ColorSpace [/Indexed /DeviceRGB 1 170 0 R] /Length 171 0 R >> stream ˙˙˙ endstream endobj 119 0 obj << /Type /XObject /Subtype /Image /Name /im6 /Width 1910 /Height 502 /BitsPerComponent 1 /ColorSpace [/Indexed /DeviceRGB 1 172 0 R] /Length 173 0 R >> stream ˙˙˙ endstream endobj
        Hide
        Andreas Lehmkühler added a comment -

        Antoni is correct: the used value was wrong and just accidently worked for the other examples. In that context bpc is meant to be bits per index and as according to the pdf specs the indexed values are in a range from 0 to 255 the bpc has to be 8. I fixed that in revision 1224733.

        @Antoni:
        Please verify if the solution also works with your pdf.

        Show
        Andreas Lehmkühler added a comment - Antoni is correct: the used value was wrong and just accidently worked for the other examples. In that context bpc is meant to be bits per index and as according to the pdf specs the indexed values are in a range from 0 to 255 the bpc has to be 8. I fixed that in revision 1224733. @Antoni: Please verify if the solution also works with your pdf.
        Hide
        Andreas Lehmkühler added a comment -

        I never got an feedback, but I'm pretty sure that this issue is fixed. Otherwise we'll simply reopen it.

        Show
        Andreas Lehmkühler added a comment - I never got an feedback, but I'm pretty sure that this issue is fixed. Otherwise we'll simply reopen it.
        Hide
        Antoni Mylka added a comment -

        O, sorry. Yes. It is fixed. It seems I overlooked the @Antoni message. Thanks very much.

        Show
        Antoni Mylka added a comment - O, sorry. Yes. It is fixed. It seems I overlooked the @Antoni message. Thanks very much.

          People

          • Assignee:
            Andreas Lehmkühler
            Reporter:
            Antoni Mylka
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development