Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4642

I'd like to know about the dependencies of PDF Box (2.0.12.0)

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Closed
    • Major
    • Resolution: Won't Do
    • 2.0.12
    • None
    • Text extraction
    • None

    Description

      We have built a .Net version of PdfBox 2.0.12.0 using IKVM and we are using it to extract Text and Form Fields.

      Currently we have taken following dependencies

      BCProv.JDK15on
      Commons.Logging
      Commons.Logging.Javadoc
      DiffUtils
      Fontbox
      HamcREST.Core
      IKVM.OpenJDK.Core
      IKVM.OpenJDK.Security
      IKVM.OpenJDK.SwingAWT
      IKVM.OpenJDK.Text
      IKVM.OpenJDK.Util
      IKVM.Reflection
      IKVM.Runtime
      jcl-over-slf4j-1.7.6

       

      While recently we have faced an issue while extracting the text out of a pdf (see below stack trace)

      System.IO.FileNotFoundException: Could not load file or assembly 'IKVM.OpenJDK.Media, Version=7.2.4630.5, Culture=neutral, PublicKeyToken=13235d27fcbfff58' or one of its dependencies. The system cannot find the file specified.

      File name: 'IKVM.OpenJDK.Media, Version=7.2.4630.5, Culture=neutral, PublicKeyToken=13235d27fcbfff58'

      at org.apache.pdfbox.filter.LZWFilter.doLZWDecode(InputStream , OutputStream , Int32 )

      at org.apache.pdfbox.filter.LZWFilter.decode(InputStream encoded, OutputStream decoded, COSDictionary parameters, Int32 index)

      at org.apache.pdfbox.filter.Filter.decode(InputStream encoded, OutputStream decoded, COSDictionary parameters, Int32 index, DecodeOptions options)

      at org.apache.pdfbox.cos.COSInputStream.create(List , COSDictionary , InputStream , ScratchFile , DecodeOptions )

      at org.apache.pdfbox.cos.COSStream.createInputStream(DecodeOptions options)

      at org.apache.pdfbox.cos.COSStream.createInputStream()

      at org.apache.pdfbox.pdmodel.PDPage.getContents()

      at org.apache.pdfbox.pdfparser.PDFStreamParser..ctor(PDContentStream contentStream)

      at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDContentStream )

      at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDContentStream )

      at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDPage page)

      at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(PDPage )

      at org.apache.pdfbox.text.PDFTextStripper.processPage(PDPage page)

      at org.apache.pdfbox.text.PDFTextStripper.processPages(PDPageTree pages)

      at org.apache.pdfbox.text.PDFTextStripper.writeText(PDDocument doc, Writer outputStream)

      at org.apache.pdfbox.text.PDFTextStripper.getText(PDDocument doc)

       

      We could mange to get the text extraction after adding these two .dlls in folder where PdfBox dll was residing.

      IKVM.OpenJDK.Media.dll 
      IKVM.AWT.WinForms.dll

       

      Later we searched about the dependancies and we reached to this site. http://www.squarepdf.net/pdfbox-in-net

      also attaching a zip of it.

       

      We found lot of other dlls which we are not considering currently.

      Thus I was wondering do we need all of these dlls or some specific. 

      And also if possible, can we have a brief information about how different dlls are being used (what kind of problems can be there if not used them)

       

       

       

      Attachments

        1. PDFBox.NET-1.8.9.zip
          22.71 MB
          Amit Maheshwari

        Activity

          People

            Unassigned Unassigned
            anutural Amit Maheshwari
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: