Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.8.0-incubator
    • Fix Version/s: 1.8.0
    • Component/s: Swing GUI
    • Labels:

      Description

      [imported from SourceForge]
      http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1259747
      Originally submitted by guzzil on 2005-08-15 02:40.

      when trying to extract images from I pdf, i get exceptions
      like
      Exception in thread "main" java.io.IOException: Unknown
      stream filter:COSName

      {JBIG2Decode}


      at
      org.pdfbox.filter.FilterManager.getFilter(FilterManager.java:116)
      at
      org.pdfbox.cos.COSStream.doDecode(COSStream.java:276)
      at
      org.pdfbox.cos.COSStream.doDecode(COSStream.java:240)
      at
      org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:173)
      at
      org.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:205)
      at
      org.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:458)
      at
      org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:131)
      at
      org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStream(PDPixelMap.java:153)
      at
      org.pdfbox.pdmodel.graphics.xobject.PDXObjectImage.write2file(PDXObjectImage.java:117)
      at
      org.pdfbox.ExtractImages.extractImages(ExtractImages.java:169)
      at
      org.pdfbox.ExtractImages.main(ExtractImages.java:73)

      The pdfs are scanned images, which are afterwards
      optimized with Adobe Acrobats "optimize" function.

      pdfimages from xpdf can extract the images.

      I can send you a pdf with this error (it is to big for an
      upload).

      [comment on SourceForge]
      Originally sent by benlitchfield.
      Logged In: YES
      user_id=601708

      yes please upload the pdf to ftp.pdfbox.org and I will take a
      look at it.

      Ben Litchfield

      1. jbig2_src.zip
        111 kB
        Jukka Zitting
      2. FilterManager.java.diff
        0.8 kB
        Kenneth Berland
      3. COSName.java.diff
        0.5 kB
        Kenneth Berland
      4. JBIG2Filter.java
        2 kB
        Kenneth Berland
      5. pdfbox-81.PDXObjectImage.patch
        0.9 kB
        Kenneth Berland
      6. sigice9_172.Adobe.pdf
        18 kB
        Tilman Hausherr
      7. sigice9_172.CVISION.pdf
        9 kB
        Tilman Hausherr

        Issue Links

          Activity

          Anonymous created issue -
          Hide
          Andreas Lehmkühler added a comment -

          JBIG2 is a (rarely??) used compression format espacially for bi-level (b/w) images such as faxes or scans and by now it is not supported by pdfbox, yet.

          See also http://www.jpeg.org/jbig/jbigpt2.html

          Show
          Andreas Lehmkühler added a comment - JBIG2 is a (rarely??) used compression format espacially for bi-level (b/w) images such as faxes or scans and by now it is not supported by pdfbox, yet. See also http://www.jpeg.org/jbig/jbigpt2.html
          Hide
          Andreas Lehmkühler added a comment -

          Changed issue type from "Bug" to "New Feature"

          Show
          Andreas Lehmkühler added a comment - Changed issue type from "Bug" to "New Feature"
          Andreas Lehmkühler made changes -
          Field Original Value New Value
          Priority Minor [ 4 ]
          Affects Version/s 0.8.0-incubator [ 12313346 ]
          Issue Type Bug [ 1 ] New Feature [ 2 ]
          Component/s PDFReader [ 12312227 ]
          Hide
          Andreas Lehmkühler added a comment -

          I've found a java-implementation for a JBIG2-decompressor. It's distributed under a BSD-license. I guess we have to have a closer look on that.

          http://support.idrsolutions.com/default.asp?W80

          Show
          Andreas Lehmkühler added a comment - I've found a java-implementation for a JBIG2-decompressor. It's distributed under a BSD-license. I guess we have to have a closer look on that. http://support.idrsolutions.com/default.asp?W80
          Andreas Lehmkühler made changes -
          Link This issue relates to PDFBOX-332 [ PDFBOX-332 ]
          Andreas Lehmkühler made changes -
          Link This issue relates to PDFBOX-229 [ PDFBOX-229 ]
          Hide
          Jukka Zitting added a comment -

          The idrsolutions link above doesn't seem to work anymore. I believe the code nowadays lives at http://www.jpedal.org/support_JBIG.php. To avoid losing the code again, I've attached the BSD-licensed source jar.

          Show
          Jukka Zitting added a comment - The idrsolutions link above doesn't seem to work anymore. I believe the code nowadays lives at http://www.jpedal.org/support_JBIG.php . To avoid losing the code again, I've attached the BSD-licensed source jar.
          Jukka Zitting made changes -
          Attachment jbig2_src.zip [ 12452920 ]
          Kenneth Berland made changes -
          Link This issue relates to PDFBOX-554 [ PDFBOX-554 ]
          Hide
          Kenneth Berland added a comment -

          The same fix works for both problems.

          Show
          Kenneth Berland added a comment - The same fix works for both problems.
          Hide
          Kenneth Berland added a comment -

          This can be fixed using http://code.google.com/p/jbig2-imageio/

          and

          creating org/apache/pdfbox/filter/JBIG2Filter.java and
          modifying org/apache/pdfbox/cos/COSName.java and org/apache/pdfbox/filter/FilterManager.java with the attached patches.

          Show
          Kenneth Berland added a comment - This can be fixed using http://code.google.com/p/jbig2-imageio/ and creating org/apache/pdfbox/filter/JBIG2Filter.java and modifying org/apache/pdfbox/cos/COSName.java and org/apache/pdfbox/filter/FilterManager.java with the attached patches.
          Kenneth Berland made changes -
          Attachment FilterManager.java.diff [ 12454907 ]
          Attachment COSName.java.diff [ 12454908 ]
          Attachment JBIG2Filter.java [ 12454909 ]
          Hide
          Kenneth Berland added a comment -

          I've also found that a hint needs to be given for the color space of these images.

          Show
          Kenneth Berland added a comment - I've also found that a hint needs to be given for the color space of these images.
          Kenneth Berland made changes -
          Attachment pdfbox-81.PDXObjectImage.patch [ 12455093 ]
          Hide
          Andreas Lehmkühler added a comment -

          I committed the proposed patch in revision 999475.

          Show
          Andreas Lehmkühler added a comment - I committed the proposed patch in revision 999475.
          Hide
          Andreas Lehmkühler added a comment -

          How should we proceed?

          We can't bundle the mentioned google plugin as it is licensed under GPLv3. Should we just add a pointer to that plugin to the pdfbox documentation or should we add the jpedal plugin.

          If we choose to add the jpedal plugin, how should we add it:

          • add a compiled jar to svn
          • add the source to svn

          As the plugin seems not to be maintained I'd prefer to add the source, so that we are able to improve it. But how can I do that. Just exchange the old header with the AL header, change the package names and add a comment to the LICENSE.txt/NOTICE.txt? Or are we not allowed to do that?

          Find attached the BSD like license header, which can be found in every source file.

          /**

          • ===========================================
          • Java Pdf Extraction Decoding Access Library
          • ===========================================

          *

          • (C) Copyright 1997-2008, IDRsolutions and Contributors.
          • Main Developer: Simon Barnett

          *

          • This file is part of JPedal

          *

          • Copyright (c) 2008, IDRsolutions
          • All rights reserved.

          *

          • Redistribution and use in source and binary forms, with or without
          • modification, are permitted provided that the following conditions are met:
          • * Redistributions of source code must retain the above copyright
          • notice, this list of conditions and the following disclaimer.
          • * Redistributions in binary form must reproduce the above copyright
          • notice, this list of conditions and the following disclaimer in the
          • documentation and/or other materials provided with the distribution.
          • * Neither the name of the IDRsolutions nor the
          • names of its contributors may be used to endorse or promote products
          • derived from this software without specific prior written permission.

          *

          • THIS SOFTWARE IS PROVIDED BY IDRsolutions ``AS IS'' AND ANY
          • EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
          • WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
          • DISCLAIMED. IN NO EVENT SHALL IDRsolutions BE LIABLE FOR ANY
          • DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
          • (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
          • LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
          • ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
          • (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
          • SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

          *

          • Other JBIG2 image decoding implementations include
          • All three of the above resources were used in the writing of this software, with methodologies,
          • processes and inspiration taken from all three.
          Show
          Andreas Lehmkühler added a comment - How should we proceed? We can't bundle the mentioned google plugin as it is licensed under GPLv3. Should we just add a pointer to that plugin to the pdfbox documentation or should we add the jpedal plugin. If we choose to add the jpedal plugin, how should we add it: add a compiled jar to svn add the source to svn As the plugin seems not to be maintained I'd prefer to add the source, so that we are able to improve it. But how can I do that. Just exchange the old header with the AL header, change the package names and add a comment to the LICENSE.txt/NOTICE.txt? Or are we not allowed to do that? Find attached the BSD like license header, which can be found in every source file. /** =========================================== Java Pdf Extraction Decoding Access Library =========================================== * Project Info: http://www.jpedal.org (C) Copyright 1997-2008, IDRsolutions and Contributors. Main Developer: Simon Barnett * This file is part of JPedal * Copyright (c) 2008, IDRsolutions All rights reserved. * Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the IDRsolutions nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. * THIS SOFTWARE IS PROVIDED BY IDRsolutions ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL IDRsolutions BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * Other JBIG2 image decoding implementations include jbig2dec ( http://jbig2dec.sourceforge.net/ ) xpdf ( http://www.foolabs.com/xpdf/ ) The final draft JBIG2 specification can be found at http://www.jpeg.org/public/fcd14492.pdf All three of the above resources were used in the writing of this software, with methodologies, processes and inspiration taken from all three.
          Hide
          Tilman Hausherr added a comment - - edited

          I have a similar problem with 1.4.0, some PDF files are JBIG2 encoded and all I get when extracting the images are white pages, not even a log error or an exception. Including the levigo-jbig2-imageio-1.1.jar in the lib list doesn't help. Same when using the jbig2.jar from the JPedal site. Same when copying it in the lib/ext directory of the JRE.

          About the "how should we proceed" question - why not document it on the dependency page?
          http://pdfbox.apache.org/dependencies.html

          Show
          Tilman Hausherr added a comment - - edited I have a similar problem with 1.4.0, some PDF files are JBIG2 encoded and all I get when extracting the images are white pages, not even a log error or an exception. Including the levigo-jbig2-imageio-1.1.jar in the lib list doesn't help. Same when using the jbig2.jar from the JPedal site. Same when copying it in the lib/ext directory of the JRE. About the "how should we proceed" question - why not document it on the dependency page? http://pdfbox.apache.org/dependencies.html
          Hide
          Tilman Hausherr added a comment -

          Two JBIG2 PDF files downloaded from
          http://jbig2.com/jb2com_technical_advantages.html

          Show
          Tilman Hausherr added a comment - Two JBIG2 PDF files downloaded from http://jbig2.com/jb2com_technical_advantages.html
          Tilman Hausherr made changes -
          Attachment sigice9_172.Adobe.pdf [ 12470124 ]
          Attachment sigice9_172.CVISION.pdf [ 12470125 ]
          Andreas Lehmkühler made changes -
          Link This issue relates to PDFBOX-1067 [ PDFBOX-1067 ]
          Andreas Lehmkühler made changes -
          Labels JBIG2
          Hide
          Tilman Hausherr added a comment -

          Doesn't work with version 1.7.1 with the jbig2.jar file. However now there's an error message, although the function page.convertToImage() doesn't fail:
          26.07.2012 12:21:13.521 ERROR [main] org.apache.pdfbox.filter.JBIG2Filter:73 - Can't find an ImageIO plugin to decode the JBIG2 encoded datastream.
          26.07.2012 12:21:13.521 ERROR [main] org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap:186 - Something went wrong ... the pixelmap doesn't contain any data.
          26.07.2012 12:21:13.521 WARN [main] org.apache.pdfbox.util.operator.pagedrawer.Invoke:86 - getRGBImage returned NULL

          When using levigo-jbig2-imageio-1.4.1.jar I get this:
          26.07.2012 12:50:46.677 ERROR [main] org.apache.pdfbox.filter.JBIG2Filter:77 - Something went wrong when decoding the JBIG2 encoded datastream.
          26.07.2012 12:50:46.677 ERROR [main] org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap:186 - Something went wrong ... the pixelmap doesn't contain any data.
          26.07.2012 12:50:46.677 WARN [main] org.apache.pdfbox.util.operator.pagedrawer.Invoke:86 - getRGBImage returned NULL

          Show
          Tilman Hausherr added a comment - Doesn't work with version 1.7.1 with the jbig2.jar file. However now there's an error message, although the function page.convertToImage() doesn't fail: 26.07.2012 12:21:13.521 ERROR [main] org.apache.pdfbox.filter.JBIG2Filter:73 - Can't find an ImageIO plugin to decode the JBIG2 encoded datastream. 26.07.2012 12:21:13.521 ERROR [main] org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap:186 - Something went wrong ... the pixelmap doesn't contain any data. 26.07.2012 12:21:13.521 WARN [main] org.apache.pdfbox.util.operator.pagedrawer.Invoke:86 - getRGBImage returned NULL When using levigo-jbig2-imageio-1.4.1.jar I get this: 26.07.2012 12:50:46.677 ERROR [main] org.apache.pdfbox.filter.JBIG2Filter:77 - Something went wrong when decoding the JBIG2 encoded datastream. 26.07.2012 12:50:46.677 ERROR [main] org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap:186 - Something went wrong ... the pixelmap doesn't contain any data. 26.07.2012 12:50:46.677 WARN [main] org.apache.pdfbox.util.operator.pagedrawer.Invoke:86 - getRGBImage returned NULL
          Hide
          Andreas Lehmkühler added a comment -

          The JPedal plugin seems to be a dead end as it doesn't support JBIG2Globals. PDFBOX-1067 provides a working solution.

          Set to resolved.

          Show
          Andreas Lehmkühler added a comment - The JPedal plugin seems to be a dead end as it doesn't support JBIG2Globals. PDFBOX-1067 provides a working solution. Set to resolved.
          Andreas Lehmkühler made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Assignee Andreas Lehmkühler [ lehmi ]
          Fix Version/s 1.8.0 [ 12321650 ]
          Resolution Fixed [ 1 ]
          Andreas Lehmkühler made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Andreas Lehmkühler
              Reporter:
              Anonymous
            • Votes:
              3 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development