Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Not A Problem
-
1.8.0
-
None
-
None
-
Linux Ubuntu 64 bit, Java
Description
Hello,
Preamble: We are glad to use PDFBox and I personally grateful to all developers who sustain this project. It is good work, guys!
We have one problem. For our application purposes we extract from pdf "char by char" with rispective coordinates for each char. (see attached Parser)
After this we group chars into the words. We noticed that for some pdf documents we have a strange "offset" for extracted rect coordinates. (see screens)
The offset is seems to be incremental (not sure) - at left top corner of document is near to real coordinates of character, but at right bottom corner is near to 0.5 cm..
If I make selection in Adobe Reader - it seems all ok.
I attached two pdf files with offset to this post.
If you want to see the offset "in action" you can use our service to do it at http://pdf2data.cloudforpeople.com/ (Please do not consider it as advertising)
Please can you test these files and tell me if it is a really bug?
How we can resolve it?
Thanks,
Vitalie