Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
None
-
None
Description
[imported from SourceForge]
http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1208824
Originally submitted by nobody on 2005-05-25 16:40.
In trying to integrate with lucene, I was having
problems. The Lucene people suggested that I check
the output of extract utility against one of my test pdf's.
When I did, I saw spaces placed inside many of the
words. I was on version 0.7.0. So I downloaded 0.7.1
and see the same results.
One of the test files where I see this issue is attached.
[attachment on SourceForge]
http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1208824&file_id=135995
Tom_3.pdf (application/pdf), 10145 bytes
Test pdf file.
Attachments
Issue Links
- duplicates
-
PDFBOX-349 Spaces between words ignored in scanned pdf files
- Closed