32970 – incorrect elision of word space in czech text

Bug 32970 - incorrect elision of word space in czech text

Summary: incorrect elision of word space in czech text

Status:	NEW

Alias:	None

Product:	Fop - Now in Jira
Classification:	Unclassified
Component:	pdf (show other bugs)
Version:	0.20.5
Hardware:	PC Windows 2000

Importance:	P3 normal
Target Milestone:	---
Assignee:	fop-dev

URL:
Keywords:

Depends on:
Blocks:

Reported:	2005-01-06 19:19 UTC by petrmk
Modified:	2012-04-04 23:02 UTC (History)
CC List:	0 users

Attachments
Example of error. Interesting is that this error is accompanied by formatting problem to which arrow is pointing (13.39 KB, image/gif) 2005-01-06 19:21 UTC, petrmk	Details
Example of error. Please read readme.txt (416.12 KB, application/x-zip-compressed) 2005-01-07 12:20 UTC, petrmk	Details
font files used in example - part 1 (949.90 KB, application/x-zip-compressed) 2005-01-07 12:21 UTC, petrmk	Details
font files used in example - part 2 (871.90 KB, application/x-zip-compressed) 2005-01-07 12:22 UTC, petrmk	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description petrmk 2005-01-06 19:19:01 UTC

We are expriencing strange problem:

Documents generated using FOP 0.20.5 sticks words accidentaly in PDF document.

We are using FOP 0.20.5 to generate PDF documents using XSL and XML on command 
line and progrmmaticaly. We are using Java j2sdk1.4.1_02

Example:

code in XSL:
<fo:inline xsl:use-attribute-sets="chr.012">This text is example</fo:inline>

In PDF is seen:
"Thistext is example"
or
"This textisexample"

When we open the same PDF document several times, it sticks different words 
sometimes, not the same each time. Occasionaly is text correct and not sticked. 
When we look in XSL, the text is correct.

Error can be produced only on large PDF files (our example has 5 pages) with 
long continuous text and many <fo:inline> sections.

We did try another XSL:FO processor - XEP 4.1 with the same XSL and XML files 
and it produces PDF without any formatting error. So it is why we assume, that 
error is in FOP.

We did test, that error can be produced on Adobe Acrobat Reader 5.0.0 CE, 5.0.5 
CE, 6.0.0 CE.
Error does not appear on AR 6.0.2 CE. But you can understood, we are not able 
to influence on users, which version of AR they will use...

In document we are using Arial font defined in userconfig and appropriate xml 
definition file.

If needed, I can provide more necessary informations as:
- XSL and XML example
- fop configuration 

Side effect of this error is GPF in AR sometimes.

Any help very wellcomed, because it looks to be serious problem for us.

Comment 1 petrmk 2005-01-06 19:21:10 UTC

Created attachment 13913 [details]
Example of error. Interesting is that this error is accompanied by formatting problem to which arrow is pointing

Comment 2 petrmk 2005-01-06 19:24:20 UTC

comment to picture:
underlined text shoul be:
"nebo prevodem", "zvolene variante", "v pojistne"

Comment 3 Jeremias Maerki 2005-01-07 10:28:27 UTC

Please attach a testcase (FO file) so we can reproduce such a PDF ourselves.
Otherwise, I don't know how to help.

Comment 4 petrmk 2005-01-07 12:20:16 UTC

Created attachment 13924 [details]
Example of error. Please read readme.txt

Thanks for response. Sorry that example is in Czech. I hope you will see the
correct display of text from attachet image or text source in xsl. If it will
make big problem, I can try to prepare example with english text section.
Error can be seen in the middle of fifth page of pdf example.

Comment 5 petrmk 2005-01-07 12:21:39 UTC

Created attachment 13925 [details]
font files used in example - part 1

Comment 6 petrmk 2005-01-07 12:22:23 UTC

Created attachment 13926 [details]
font files used in example - part 2

Comment 7 petrmk 2005-01-07 12:27:43 UTC

I am sending the font files, too. It is because I thing problem could be 
somewhere in embedding fonts or something similar. Our test did show two things:

1) If you remove font definitions for Arial from userconfig.xml, error never 
occurs... But # sign is displayed instead Czech characters, of course.
2) Very occasionaly Acrobat Reader shows error with text similar to "Can not 
remove 2ac2eArial font ....." before GPF.

Comment 8 Jeremias Maerki 2005-01-07 12:56:22 UTC

I checked on Acrobat Readers 4.0.5, 5.0, 6.0.2 and GhostScript 8.14 and all
showed your PDF and a newly generated one (using your sources) correctly, even
closing and reopening the file several times.

What you could try is to disable kerning by setting kerning="no" on each font,
but it may be necessary to remove all "kerning" elements from the font metrics
XML files. Maybe it has something to do with that but I'm not sure.

At the moment I have no clue what's wrong.

Comment 9 petrmk 2005-01-07 17:44:04 UTC

Thank you for quick reply.

Well, we have here a typical helpdesk problem - how to reproduce problem 
customer is reporting :-/. I know it, because I was on techsupport helpdesk one 
upon the time :-)

Disabling of kerning I did try early and it does not help at all.

Your reply discover that problem can be reproduced only on regional versions of 
Acrobat Reader. It does not matter, if it is Czech version or English Central 
European version. Of course I did not test all existing versions of AR, but I 
think that this example is satisfactionary.

To comment more versions you did test:
4.0.5 - we are not using it and so I did not test it
5.0.0 CE - regional czech version produces the error.
5.0.5 - english version - does not produce error, 
5.0.5 CE - regional czech version produces the error.
6.0.0 CE - English version for Central Europe produces the error
6.0.1 - English version does not produce the error
6.0.2 - as stated early, does not produce problem on regional czech version

So it seems, that problem is attached with  English CE or Czech CE versions of 
Acrobat of versions 5.x and 6.0.0 or 6.0.1.
But still here remains question, why PDF produced by XEP 4.1 does not produce 
the error on any version.

I am not expert - but only to give you any informations - can not be problem 
connected with embedding fonts some way? Maybe XEP uses a little bit different 
aproach for it?

To help you reproduce the problem I put on our ftp two examples of installation 
packages of AR 5.0.5 CE:

ftp.sybase.cz/pub

you can use anonymous login

Any conclusions, which can help us to solve this problem wellcomed

Comment 10 Jeremias Maerki 2005-01-11 10:31:16 UTC

I'm extremely reluctant to install a software in a language I don't understand.
It's already enough of a pain to have a mixed german/english system. I'm sorry,
but I won't install that version of Acrobat.

IMO if different Acrobat variant (even with the same version numbers) produce
different results it's more of a problem of Acrobat. I'd check if Adobe support
has any idea.

Comment 11 petrmk 2005-01-11 11:00:37 UTC

I understand. 

Meantime, is any way how to embed complete ttf font into PDF document instead 
needed parts only?

http://xml.apache.org/fop/fonts.html writes:
"... When embedding TrueType fonts (ttf) or TrueType Collections (ttc), a 
subset of the original font, containing only the glyphs used, is embedded in 
the output document. ..."

XEP includes complette font informations and theirs PDF does not prouce the 
problem. 

We have some developers capacity to try to solve such problem ourself, so if 
some change of souce will help, we can try it.

Comment 12 J.Pietschmann 2005-01-11 16:58:42 UTC

The description is a bit misleading. FOP embeds the whole font file into the
PDF, which is actually a bad idea (CJK fonts files are big). There are additional
structures in the PDF which are needed for mapping PDF character indexes into
glyph indexes. FOP fills these with data only for characters which have been
used in the document. I vaguely remember there are a few more PDF structures
still missing.