Issue 119312 - PDF import has very poor layout accuracy
Summary: PDF import has very poor layout accuracy
Status: CONFIRMED
Alias: None
Product: extensions
Classification: Extensions
Component: pdfimport (show other issues)
Version: current
Hardware: PC All
: P3 Normal (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-05 10:13 UTC by fkbreitl
Modified: 2013-02-07 22:36 UTC (History)
4 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
One of many PDF documents that gets screwed up (117.76 KB, application/octetstream)
2012-05-05 10:13 UTC, fkbreitl
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description fkbreitl 2012-05-05 10:13:03 UTC
Created attachment 77514 [details]
One of many PDF documents that gets screwed up

I import a PDF document into Draw and the result looks very different from the original.

Steps to reproduce:
1. Start LibreOffice
2. Open the PDF document attached

Current behavior:
Several lines get stacked on each other.

Expected behavior:
The imported document should look like the original PDF.

Platform (if different from the browser): 
Ubuntu 12.04, LibreOffice 3.5.2.2.
But it also happens in Windows 7, OpenOffice 3.3.0
Browser: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101
Firefox/12.0
Comment 1 Andre 2012-05-07 08:41:32 UTC
@fkbreitl: Can you check that with Apache OpenOffice as well?
(This is bugzilla of Apache OpenOffice, not of LibreOffice)
Comment 2 fkbreitl 2012-05-07 08:53:10 UTC
(In reply to comment #1)

As I stated above:
> But it also happens in Windows 7, OpenOffice 3.3.0
Comment 3 fkbreitl 2012-05-07 08:56:26 UTC
I also reported it here: https://bugs.freedesktop.org/show_bug.cgi?id=49431
Comment 4 Andre 2012-05-07 09:00:25 UTC
Did you check Apache OpenOffice 3.4 (which contains a lot of bug fixes from OpenOffice 3.4)?
Comment 5 fkbreitl 2012-05-07 09:13:15 UTC
(In reply to comment #4)
> Did you check Apache OpenOffice 3.4 (which contains a lot of bug fixes from
> OpenOffice 3.4)?

I didn't check prereleases but its very unlikely that the bug disapeared, especially since its still in LibreOffice 3.4.
Comment 6 Andre 2012-05-07 09:58:44 UTC
Lets wait for the upcoming release of AOO 3.4 and then test if the bug is still there.
Comment 7 fkbreitl 2012-05-07 10:18:41 UTC
With this attitude I assume OO will keep its poor conditions for decades.
Comment 8 Andre 2012-05-07 11:19:40 UTC
I beg your pardon, what attitude?
Comment 9 fkbreitl 2012-05-07 12:35:47 UTC
Instead of hoping in vain for the bug to miraculously disappear in the next release, it should be confirmed and result in immediate action to get it fixed and closed in future releases.

If there are profound convincing reason that this bug is resolved in the pre-release (i.e. somebody has been working on it or at least on the PDF extension) I will consider testing it. Otherwise such a test is just a waste of my time with no benefit for the project.
Moreover testing on pre-releases is the job of the developers, which have those versions already installed and can do it much more efficient than the users.
Comment 10 Andre 2012-05-07 12:52:56 UTC
I do not want to rely on hopes and assumptions.  Before I (or probably somebody else) start to fix this bug I have to make sure that it still exists.  Anything else would be a waste of my time.  There are newer versions of both LibreOffice (currently 3.5.3) and Apache OpenOffice (3.4 developer builds).

And as we talk about attitude.  If I (or another developer) spend my time on fixing this issue, is it really such an outrageous request to ask you (or anybody else who is interested in getting this fixed) to check that this bug still exists on current versions?  Please keep in mind that fixing this will probably take much more time than testing it.
Comment 11 fkbreitl 2012-05-07 13:16:55 UTC
I agree that fixing is much more work and its highly appreciated.

However testing is work too and there will always be a later version.
The OpenOffice web page advertises 3.3.0 as latest stable version.
For tests of anything newer I see the responsibility on the developer side.
However, I am even willing to help out here, if there are convincing reasons for it and people start working with me.

However I am reluctant to test for no good reason, since from my experience with OO I know that even well known, reported and voted bugs are carried on from release to release.
Comment 12 vandertim 2012-05-08 19:13:49 UTC
Bug #119198 and possible solution to this related PDF Error within Apache OpenOffice 3.4:

https://issues.apache.org/ooo/show_bug.cgi?id=119198
Comment 13 Dave Fisher 2012-05-08 20:43:59 UTC
In your case I have verified that AOO 3.4 does render page 2 and page 3 imperfectly. I tested a Mac version.

In looking at the attached PDF I see that the original is a Word document and the file was produced by a Mac OS X 10.5.8 Quartz PDFContext and is version 1.6 PDF. There are embedded font subsets of Windows standard fonts.

I extracted the awful page 3 to a separate page Acrobat and the import in AOO 3.4 was just as bad.

It is very much a non-trivial task to re-assemble the text strings from a PDF into usable text blocks. Remember that the PDF file format was designed as digital paper.

With your example the next developer who attempts to fix PDF import will have another example to use.

Meanwhile if you have the original Word document, how does Writer handle that?