Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
[Issue from SourceForge]
http://sourceforge.net/tracker/index.php?func=detail&aid=1922502&group_id=78314&atid=552832
I am using PDF-Box-0.7.3.dll with C# and have tested extraction on two
searchable pdfs that I have scanned in from paper. Spaces between words are
ignored for both files. I have also tested another pdf file (which I
downloaded from the internet) and it was parsed correctly. Unfortunately,
the file is 1.2MB and the upload was blocked. Please send me an email
(gkobzeff@hotmail.com) and I will reply back with the file.
Thanks for looking into this.
Greg
[Comment on SourceForge]
Date: 2008-03-23 21:24
Sender: gkobzeff
Logged In: YES
user_id=2042611
Originator: YES
I have scanned the file into a smaller file size. I have attached the
file.
Thanks
File Added: Advanced Pain Mgmt BW.pdf
http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&file_id=271548&aid=1922502
Attachments
Attachments
Issue Links
- is duplicated by
-
PDFBOX-61 Spaces in extracted file
- Closed
-
PDFBOX-77 PDF-Extraction splits words by spaces
- Closed
-
PDFBOX-347 Spaces removed after text extraction
- Closed
- is related to
-
PDFBOX-80 Does not convert spacing. gourps words
- Closed
-
PDFBOX-146 Document does not separate words
- Closed
-
PDFBOX-234 spaces lost
- Closed