Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Not A Problem
-
1.8.13
-
None
-
None
Description
i converted this pdf from the attached word document "DummyDoc.docx"
then when using pdfbox1.8 to extract text
java -jar pdfbox-app-1.8.13.jar ExtractText "DummyDoc.pdf" txt.txt
and the generated is
Dummy document for tag extraction
Section 1
DummyTagOne_01
This is text body one
DummyTagOne_02
This is text body two
Section 2
DummyTagTwo_01
This is text body three
DummyTagTwo_02
This is text body four
DummyTagTwo_03
This is text body five
as you can see "This is text body one " instead of "This is text body one " and so on