Details
-
Wish
-
Status: Closed
-
Major
-
Resolution: Won't Do
-
2.0.12
-
None
-
None
Description
We have built a .Net version of PdfBox 2.0.12.0 using IKVM and we are using it to extract Text and Form Fields.
Currently we have taken following dependencies
BCProv.JDK15on
Commons.Logging
Commons.Logging.Javadoc
DiffUtils
Fontbox
HamcREST.Core
IKVM.OpenJDK.Core
IKVM.OpenJDK.Security
IKVM.OpenJDK.SwingAWT
IKVM.OpenJDK.Text
IKVM.OpenJDK.Util
IKVM.Reflection
IKVM.Runtime
jcl-over-slf4j-1.7.6
While recently we have faced an issue while extracting the text out of a pdf (see below stack trace)
System.IO.FileNotFoundException: Could not load file or assembly 'IKVM.OpenJDK.Media, Version=7.2.4630.5, Culture=neutral, PublicKeyToken=13235d27fcbfff58' or one of its dependencies. The system cannot find the file specified.
File name: 'IKVM.OpenJDK.Media, Version=7.2.4630.5, Culture=neutral, PublicKeyToken=13235d27fcbfff58'
at org.apache.pdfbox.filter.LZWFilter.doLZWDecode(InputStream , OutputStream , Int32 )
at org.apache.pdfbox.filter.LZWFilter.decode(InputStream encoded, OutputStream decoded, COSDictionary parameters, Int32 index)
at org.apache.pdfbox.filter.Filter.decode(InputStream encoded, OutputStream decoded, COSDictionary parameters, Int32 index, DecodeOptions options)
at org.apache.pdfbox.cos.COSInputStream.create(List , COSDictionary , InputStream , ScratchFile , DecodeOptions )
at org.apache.pdfbox.cos.COSStream.createInputStream(DecodeOptions options)
at org.apache.pdfbox.cos.COSStream.createInputStream()
at org.apache.pdfbox.pdmodel.PDPage.getContents()
at org.apache.pdfbox.pdfparser.PDFStreamParser..ctor(PDContentStream contentStream)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDContentStream )
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDContentStream )
at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDPage page)
at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(PDPage )
at org.apache.pdfbox.text.PDFTextStripper.processPage(PDPage page)
at org.apache.pdfbox.text.PDFTextStripper.processPages(PDPageTree pages)
at org.apache.pdfbox.text.PDFTextStripper.writeText(PDDocument doc, Writer outputStream)
at org.apache.pdfbox.text.PDFTextStripper.getText(PDDocument doc)
We could mange to get the text extraction after adding these two .dlls in folder where PdfBox dll was residing.
IKVM.OpenJDK.Media.dll
IKVM.AWT.WinForms.dll
Later we searched about the dependancies and we reached to this site. http://www.squarepdf.net/pdfbox-in-net
also attaching a zip of it.
We found lot of other dlls which we are not considering currently.
Thus I was wondering do we need all of these dlls or some specific.
And also if possible, can we have a brief information about how different dlls are being used (what kind of problems can be there if not used them)