Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-742

[patch] Please don't print logging statements to System.err

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.0
    • Fix Version/s: 1.2.0
    • Component/s: PDModel
    • Labels:
      None

      Description

      There are three org.apache.pdfbox.filter.Filter implementations which are unimplemented. These are:

      CCITTFaxDecodeFilter
      DCTFilter
      RunLengthDecodeFilter

      They all contain calls to System.err with messages like

      Warning: DCTFilter.decode is not implemented yet, skipping this stream.

      In my code I iterate over all images in a PDF and try to obtain their raw, undecoded content. I use code like this:

      private byte [] getUnDecodedImageBytes(COSStream st) throws IOException {
      ByteArrayOutputStream baos = new ByteArrayOutputStream();
      IOUtil.writeStream(st.getUnfilteredStream(), baos);
      return baos.toByteArray();
      }

      The getUnfilteredStream() method, when called on JPG embedded images seems to try to invoke the DCTFilter. If I have a large ebook file with lots of JPG images - this yields LOTS of text to the Standard error output which can't be suppressed.

      PDFBox uses commons-logging all over the place. Why not push those warnings to the log. They are non-critical. In my particular case when I use the above method I get an empty array. If I do, I resort to another method:

      private byte [] getDecodedImageBytes(COSStream st) throws IOException {
      ByteArrayOutputStream baos = new ByteArrayOutputStream();
      PDXObjectImage ximage = (PDXObjectImage)PDXObject.createXObject( st );
      ximage.write2OutputStream(baos);
      return baos.toByteArray();
      }

      This seems to work, even for those images where getUnfilteredStream returns an empty stream.

      I don't quite understand what's the difference, since I would expect a method labelled 'getUnfilteredStream' to return the stream as-it-is in the PDF file, without using any Filters. Moreover such a failure would imply that the library simply cannot process JPG images in PDF files, which is not the case because write2OutputStream works OK. So I don't know where the real problem lies. Maybe someone with more PDFBox knowledge could take a look.

      Still, my patch only moves those warnings to the log, where I can suppress them. This is simple and fixes the immediate problem in my application.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              antoni.mylka Antoni Mylka
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: