Bug 39422 - [PATCH] Fop fails to render non-ascii characters in PDF output
[PATCH] Fop fails to render non-ascii characters in PDF output
Status: NEW
Product: Fop - Now in Jira
Classification: Unclassified
Component: fonts
1.0
PC Windows XP
: P2 normal
: ---
Assigned To: fop-dev
: PatchAvailable
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2006-04-27 05:22 UTC by Max Berger
Modified: 2012-04-11 03:21 UTC (History)
1 user (show)



Attachments
A small test file using the summation (sigma) character (489 bytes, text/plain)
2006-04-27 05:24 UTC, Max Berger
Details
Sample SVG with a SIGMA character (1.66 KB, image/svg+xml)
2006-04-28 16:04 UTC, Max Berger
Details
patch for PDFGraphics2D that enables rendering of special characters (15.77 KB, patch)
2006-04-29 06:59 UTC, Max Berger
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Max Berger 2006-04-27 05:22:47 UTC
Dear developers,

I know the font system is currently undergoing changes. Please excuse this if this gets fixed soon.

When rendering a document containing "special" characters to pdf, they get shown as # instead of the 
right character.

(fop svn, current checkout)

For a full example, see the attached test file.

Example:
test ∑ test

should render: test E test (where E is the sigma character).

On the pdf output I get:
test # test

AWT output works fine.

I don't know if this is specific to a Mac system. I have no user fonts installed.

(What it should do is figure out that the standard font does not have the SIGMA character and switch to 
the symbol font).

If there is no one working on it and you point me in the right direction I may be able to provide a patch.

Max
Comment 1 Max Berger 2006-04-27 05:24:09 UTC
Created attachment 18189 [details]
A small test file using the summation (sigma) character
Comment 2 Jeremias Maerki 2006-04-27 08:08:00 UTC
What you describe is a better coverage of the behaviour described by the
"font-selection-strategy" property. It would really be good if we had that. You
might want to check with Vincent Hennebert if he hasn't already done work in
this area. If he hasn't this should clash too much with Vincent's work.

I haven't given much thought to how this would have to be implemented. I assume
the FOText objects (o.a.fop.fo package) would have to be split after the right
font for each snippet has been determined. But I'm not sure if this is enough.
Comment 3 Max Berger 2006-04-27 16:15:47 UTC
In am not sure if fo-tree is the right place to fix this. I've also noticed the
same behavior when an SVG graphics contains a SIGMA sign. It renders fine in the
AWT output (squiggle, or AWT in fop), but not in the PDF version.
Comment 4 Vincent Hennebert 2006-04-27 22:07:59 UTC
This is because the default fonts used for AWT output have more glyphs than the
Base14 fonts used for PDF output.

And yes this is a font-selection-strategy issue. I haven't studied it yet in
detail, but my idea is that this shouldn't be dealt with at the font system
level. Rather the font system should provide facilities like getting the set of
glyphs which cannot be rendered in a given String, or perhaps getting the fonts
which can display a given glyph.

It's probably not worth starting to write code right now, as lots of things will
certainly change with the new font system. But if you're interested you may have
a look at the aXSL interface (www.axsl.org), which is the interface that the new
font system will implement. You may look if the provided methods help solving
this problem, what's missing, and where and how to implement
font-selection-strategy, within Fop, on top of that. This would certainly ease
the implementation once the migration is done.

Vincent
Comment 5 Max Berger 2006-04-27 23:13:27 UTC
Dear Vincent,

some kind of "hasGlyph" function is most certainly necessary. And maybe some helper functions to go 
with it, but these can be easlily implementing using the "hasGlyph"

the current font system has a "hasChar" function:
  public boolean hasChar(char c);
 
I've added a tracker item for axsl

http://sourceforge.net/tracker/index.php?func=detail&aid=1478049&group_id=123259&atid=695974

---

I've also found the source of my '#'. It is is o.a.f.render.pdf.PDFRenderer#escapeText:

            if (fs.hasChar(orgChar)) {
                ch = fs.mapChar(orgChar);
                int tls = (i < l - 1 ? parentArea.getTextLetterSpaceAdjust() : 0);
                glyphAdjust -= tls;
            } else {
                if (CharUtilities.isFixedWidthSpace(orgChar)) {
                    //Fixed width space are rendered as spaces so copy/paste works in a reader
                    ch = fs.mapChar(CharUtilities.SPACE);
                    glyphAdjust = fs.getCharWidth(ch) - fs.getCharWidth(orgChar);
                } else {
                    ch = fs.mapChar(orgChar);
                }
            }

and by default fs.mapChar(orgChar) returns '#' if the char is not in that font. 

----

so my "dirty hack" solution would be:
- if glyph is not in font, go through list of all fonts until you find a font that has this glyph, use it 
instead.

a good solution would be:
- get a list of all fonts supporting that glyph. Find the one that is the "best match". use it.

Max
Comment 6 Victor Mote 2006-04-28 00:37:04 UTC
(In reply to comment #5)

Hi Max:

> I've added a tracker item for axsl
> http://sourceforge.net/tracker/index.php?
func=detail&aid=1478049&group_id=123259&atid=695974

I am responding to the aXSL request here so that I can pick up the existing 
thread. The aXSL methods you are looking for are:
FontUse boolean glyphAvailable(int codePoint)
FontUse int unavailableChar(CharSequence chars, int beginIndex)
FontUse int[] unavailableChars(CharSequence chars, int beginIndex)

These methods are in FontUse instead of Font so that we can properly deal with 
Encoding issues, mostly for Type1 fonts. FontUse is the intersection of a Font, 
a FontConsumer, and an Encoding. FontUse instances are what get returned by the 
font-selection methods. For a glyph to be usable by your application, it must 
both 1) be available in the font, and 2) encodable by the font's encoding.

> so my "dirty hack" solution would be:
> - if glyph is not in font, go through list of all fonts until you find a font 
that has this glyph, use it 
> instead.
> a good solution would be:
> - get a list of all fonts supporting that glyph. Find the one that is 
the "best match". use it.

This might be permissible under font-selection-strategy="auto", but I rather 
think would only be permissible as a fallback. What you probably really want is 
to implement the font-selection-strategy="character-by-character". The font-
selection methods in aXSL require one codepoint to be passed, presumably the 
first codepoint that needs to be encoded. Then, using the methods noted above, 
your application needs to determine whether the remaining text can use the same 
font. If not, the font-selection method needs to be consulted again, this time 
passing the codepoint that is not served by the first font selected. IIRC, the 
last time I looked at FOP code, it took the first font-family in the list and 
used it for all text within scope, simply using a # glyph if the desired glyph 
was not available. I think Vincent is working on changing that, but I don't 
monitor the FOP lists, and don't know the status. When I implemented this in 
FOray, the hard part was not the algorithm, but finding the place to store its 
results. Either the FOTree or AreaTree has to know how to segment a chunk of 
text based on font selection.

So, although the font system provides tools that are needed for the correct 
algorithm, it doesn't have any control over whether the correct algorithm is 
used.

Since the XSL-FO "font-family" property is really a *list*, to ensure that you 
get a sigma character, you might say font-family="Base14-Helvetica, Base14-
Symbol". Assuming the other font-selection criteria allow it, your sigma 
character would then be handled by the Base14-Symbol font.

HTH.

Victor Mote
Comment 7 Max Berger 2006-04-28 16:03:17 UTC
I really think it is counterintuitive to have to add "symbol" to my list of favorite fonts. Unfortunately this 
what the spec says (xsl 1.1 / 7.9.2). So the proper font-selection strategy would solve the problem 
presented in my original file. Then I guess I'll just have to wait for that...

Then  I have to wishes for that. The first is that the "Symbol" and "ZapfDingbats"  are part of the 
"default" font lists, such as the one when you do not specify a font and when you specify a generic font 
such as "serif" or "sans-serif".

The second wish is a warning when an unsupported glyph is encountered. The PSRenderer does that 
currently.

The next problem, however, is the inclusion of SVG graphics that contain glyphs. I'll attach a sample file 
that works fine in svg viewers, but not within a fop-pdf due to the same font issues. Or would this be a 
batik / xmlgraphics issue?
Comment 8 Max Berger 2006-04-28 16:04:13 UTC
Created attachment 18200 [details]
Sample SVG with a SIGMA character
Comment 9 Max Berger 2006-04-29 03:47:24 UTC
Here is some additional info.

When looking for the font-mechanism, I noticed a lot of duplicate code between
o.a.f.svg.PDFGraphics2D#drawString
and
o.a.f.render.pdf.PDFRenderer#drawWord

IMO a lot of this should go into a common place, ideally even into xmlgraphics:
org.apache.xmlgraphics.pdf sounds like a very good place....

For the SVG graphics this seems suddently much easier, as there already is a "drawString" method 
which is able to switch fonts. 

So here's a possible solution for my SVG problem:
- make the simpler drawString method call the attributed one. (shouldn't hurt). They have lots of 
duplicated code anyways.
- implement automatic font switching in the attributed drawString method. (Simple for now, and maybe 
more sophisticated in the future).  Have it in an external function so that it can be reused later.

For the special characters in the text i'll wait for the new font system.
Comment 10 Max Berger 2006-04-29 06:59:01 UTC
Created attachment 18205 [details]
patch for PDFGraphics2D that enables rendering of special characters

This patch implements the strategy from my last comment.

The font-selection is very basic and just tries Symbol and ZapfDingbats (which
are most likely to have the desired character)

Using this patch enables my svg-formulas to be displayed correctly. For the
in-fo-text characters I'll wait for the font redesign.
Comment 11 Glenn Adams 2012-04-07 01:42:50 UTC
resetting P2 open bugs to P3 pending further review
Comment 12 Glenn Adams 2012-04-11 03:21:14 UTC
increase priority for bugs with a patch