[PDFBOX-4642] I'd like to know about the dependencies of PDF Box (2.0.12.0) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Wish
Status: Closed
Priority: Major
Resolution: Won't Do
Affects Version/s: 2.0.12
Fix Version/s: None
Component/s: Text extraction
Labels:
None

Description

We have built a .Net version of PdfBox 2.0.12.0 using IKVM and we are using it to extract Text and Form Fields.

Currently we have taken following dependencies

BCProv.JDK15on
Commons.Logging
Commons.Logging.Javadoc
DiffUtils
Fontbox
HamcREST.Core
IKVM.OpenJDK.Core
IKVM.OpenJDK.Security
IKVM.OpenJDK.SwingAWT
IKVM.OpenJDK.Text
IKVM.OpenJDK.Util
IKVM.Reflection
IKVM.Runtime
jcl-over-slf4j-1.7.6

While recently we have faced an issue while extracting the text out of a pdf (see below stack trace)

System.IO.FileNotFoundException: Could not load file or assembly 'IKVM.OpenJDK.Media, Version=7.2.4630.5, Culture=neutral, PublicKeyToken=13235d27fcbfff58' or one of its dependencies. The system cannot find the file specified.

File name: 'IKVM.OpenJDK.Media, Version=7.2.4630.5, Culture=neutral, PublicKeyToken=13235d27fcbfff58'

at org.apache.pdfbox.filter.LZWFilter.doLZWDecode(InputStream , OutputStream , Int32 )

at org.apache.pdfbox.filter.LZWFilter.decode(InputStream encoded, OutputStream decoded, COSDictionary parameters, Int32 index)

at org.apache.pdfbox.filter.Filter.decode(InputStream encoded, OutputStream decoded, COSDictionary parameters, Int32 index, DecodeOptions options)

at org.apache.pdfbox.cos.COSInputStream.create(List , COSDictionary , InputStream , ScratchFile , DecodeOptions )

at org.apache.pdfbox.cos.COSStream.createInputStream(DecodeOptions options)

at org.apache.pdfbox.cos.COSStream.createInputStream()

at org.apache.pdfbox.pdmodel.PDPage.getContents()

at org.apache.pdfbox.pdfparser.PDFStreamParser..ctor(PDContentStream contentStream)

at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDContentStream )

at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDContentStream )

at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDPage page)

at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(PDPage )

at org.apache.pdfbox.text.PDFTextStripper.processPage(PDPage page)

at org.apache.pdfbox.text.PDFTextStripper.processPages(PDPageTree pages)

at org.apache.pdfbox.text.PDFTextStripper.writeText(PDDocument doc, Writer outputStream)

at org.apache.pdfbox.text.PDFTextStripper.getText(PDDocument doc)

We could mange to get the text extraction after adding these two .dlls in folder where PdfBox dll was residing.

IKVM.OpenJDK.Media.dll
IKVM.AWT.WinForms.dll

Later we searched about the dependancies and we reached to this site. http://www.squarepdf.net/pdfbox-in-net

also attaching a zip of it.

We found lot of other dlls which we are not considering currently.

Thus I was wondering do we need all of these dlls or some specific.

And also if possible, can we have a brief information about how different dlls are being used (what kind of problems can be there if not used them)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PDFBox.NET-1.8.9.zip
04/Sep/19 08:48
22.71 MB
Amit Maheshwari

Activity

People

Assignee:: Unassigned

Reporter:: Amit Maheshwari

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 04/Sep/19 08:50

Updated:: 04/Sep/19 17:34

Resolved:: 04/Sep/19 17:34