Bug 48567

Summary: Implementation of support for double-byte fonts by AFP renderer.
Product: Fop - Now in Jira Reporter: Peter Hancock <peter.hancock>
Component: fontsAssignee: fop-dev
Status: CLOSED FIXED    
Severity: enhancement    
Priority: P2    
Version: trunk   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: Proposed patch
patch of implementation
xdocs

Description Peter Hancock 2010-01-18 07:25:10 UTC
The requirment is detailed here http://wiki.apache.org/xmlgraphics-fop/AFPFonts
Comment 1 Peter Hancock 2010-01-18 07:31:51 UTC
Created attachment 24854 [details]
Proposed patch
Comment 2 Peter Hancock 2010-01-19 02:59:38 UTC
Created attachment 24857 [details]
patch of implementation
Comment 3 Jeremias Maerki 2010-01-20 07:30:29 UTC
Peter, I've taken a peek at your code. Eclipse had a bit of trouble applying the patch because you renamed AFPFontReader to CharacterSetBuilder, but that was not insurmountable. From the functionality POV I guess this works fine. At least I get nicely formatted Japanese text which seems to be consistent with PDF output generated with Arial Unicode MS.

You've asked me off-list if it is correct to get the em space width from the FNO field rather than the FNC field. But I don't see any information on the em space in the FNC field. At any rate, FNO is the right place to get this from IMO.

While going through your changes I wondered if it would not have made sense to extract the em space in every case, not just in the double-byte case. That could have reduced the code duplication a bit that is caused by org.apache.fop.afp.fonts.CharacterSetBuilder.DoubleByteLoader. At least the processFontOrientation() method looks like unnecessary code duplication to me. Furthermore, it is a bit strange that you call this "unitPerEm" in CharacterSetOrientation what is actually the em space increment. Maybe that's what you confused in FNC with. There's a unit per em there but that has a different purpose.

I'm wondering if you would agree that using "Type 0" (or "CIDKeyed") instead of "double-byte" is a more accurate choice for the font type in the user configuration.

Do you think that the fallback character business in the configuration is really necessary? I fear that this is too complicated for users. I don't quite see what it would help. Can you explain why you added this?

A few observations while looking into this (not related to your work):
- Some of the AFP viewers to quite some time to display the files with the embedded Japanese font, the Papyrus Client being the exception. Maybe that is just so. At least there doesn't seem to be anything technically wrong.
- The restriction of the font and codepage names to exactly 8 characters is something we should relax eventually (by creating actual FullyQualifiedNameTriplet instances in MapCodedFont). Probably a remainder from the times where the triplets were not encapsulated in classes. At least there is no such restriction in the AFP specs.

If you don't have anything else that you need to address with this patch, I could commit it later this week. I'd like your feedback on my comments above before I do that. I could handle the few changes myself. Since I've already made some local, cosmetic changes locally, that would save my applying the patch a second time. However, if you could update the documentation for the website once we've finalized the above, that'd be appreciated.
Comment 4 Peter Hancock 2010-01-20 08:55:17 UTC
(In reply to comment #3)


> Peter, I've taken a peek at your code.
Thanks for reviewing the patch so quicky!

> Eclipse had a bit of trouble applying
> the patch because you renamed AFPFontReader to CharacterSetBuilder, but that
> was not insurmountable.
Sorry about that!


> You've asked me off-list if it is correct to get the em space width from the
> FNO field rather than the FNC field. But I don't see any information on the em
> space in the FNC field. At any rate, FNO is the right place to get this from
> IMO.

> Furthermore, it is a bit strange that you call this "unitPerEm" in
> CharacterSetOrientation what is actually the em space increment. Maybe that's
> what you confused in FNC with. There's a unit per em there but that has a
> different purpose.

I was unsure if there was a distinction between 'em space increment' and 'units per em' - now I know.


> While going through your changes I wondered if it would not have made sense to
> extract the em space in every case, not just in the double-byte case. That
> could have reduced the code duplication a bit that is caused by
> org.apache.fop.afp.fonts.CharacterSetBuilder.DoubleByteLoader. At least the
> processFontOrientation() method looks like unnecessary code duplication to me.
That makes sense - we can remove the now redundant CharacterSetOrientation(int) constructor as well.




> I'm wondering if you would agree that using "Type 0" (or "CIDKeyed") instead of
> "double-byte" is a more accurate choice for the font type in the user
> configuration.
CIDKeyed then?

> 
> Do you think that the fallback character business in the configuration is
> really necessary? I fear that this is too complicated for users. I don't quite
> see what it would help. Can you explain why you added this?
using a fallback character was my first approach and the em space, as recommended by yourself, my second.  In retrospect I now totally agree that the fallback character is probably useless and certainly reduces usability.


> - The restriction of the font and codepage names to exactly 8 characters is
> something we should relax eventually (by creating actual
> FullyQualifiedNameTriplet instances in MapCodedFont). Probably a remainder from
> the times where the triplets were not encapsulated in classes. At least there
> is no such restriction in the AFP specs.
I am looking at this now.  I thought it may have been an AFP restriction since I seemed to get a badly formed AFP when I forced through a name of length 6.  This will have to be resolved at a later time I guess. 


> If you don't have anything else that you need to address with this patch, I
> could commit it later this week. I'd like your feedback on my comments above
> before I do that. I could handle the few changes myself. Since I've already
> made some local, cosmetic changes locally, that would save my applying the
> patch a second time.
I agree with all of your suggestions.  Before you go ahead and commit I wonder how easy it is to configure fop to not embed the font.  Is much dev work required do you think?  In http://xmlgraphics.apache.org/fop/trunk/fonts.html it is claimed that not supplying an embed-url attribute directs fop not to embed the font, but this is not so,  and in fact, fop never seems to have a chance at setting the embeddable flag on an AFP font.  When AFPRendererConfigurator constructs the font instance it could set the flag then based on an attribute, but what other information is needed?  When I manually set embeddable to false the rendered AFP only declares the font resource and the code page in the MCF structured field.  I have not tried printing yet (Our afp printer is playing up for me today) or checked the spec to see if this is ok.  If you happen to know that it should work we could perhaps add a boolean-like attribute to the font attribute (respected when type="CIDKeyed" or all AFP?) and call AFPFont.setEmbeddable() upon creation (or pass a constructor arg). Otherwise I guess this would be a  future unit work.



>However, if you could update the documentation for the
> website once we've finalized the above, that'd be appreciated.
I will attach this as a second patch to this bug.
Comment 5 Peter Hancock 2010-01-21 07:35:10 UTC
Created attachment 24877 [details]
xdocs
Comment 6 Peter Hancock 2010-01-21 07:46:26 UTC
The attachment 'xdocs' is the corresponding documentation update for the webserver
Comment 7 Jeremias Maerki 2010-01-21 08:01:33 UTC
Thanks for your feedback. We're in agreement, it seems.

(In reply to comment #4)
<snip/>
> I agree with all of your suggestions.  Before you go ahead and commit I wonder
> how easy it is to configure fop to not embed the font.  Is much dev work
> required do you think?  In http://xmlgraphics.apache.org/fop/trunk/fonts.html
> it is claimed that not supplying an embed-url attribute directs fop not to
> embed the font, but this is not so,  and in fact, fop never seems to have a
> chance at setting the embeddable flag on an AFP font.  When
> AFPRendererConfigurator constructs the font instance it could set the flag then
> based on an attribute, but what other information is needed?  When I manually
> set embeddable to false the rendered AFP only declares the font resource and
> the code page in the MCF structured field.  I have not tried printing yet (Our
> afp printer is playing up for me today) or checked the spec to see if this is
> ok.  If you happen to know that it should work we could perhaps add a
> boolean-like attribute to the font attribute (respected when type="CIDKeyed" or
> all AFP?) and call AFPFont.setEmbeddable() upon creation (or pass a constructor
> arg). Otherwise I guess this would be a  future unit work.

AFP Font referencing should work via the mechanism described under:
http://xmlgraphics.apache.org/fop/trunk/fonts.html#embedding
(<referenced-fonts>)
At least, I've tested that with outline and bitmap fonts at some point. However, we need to see what happens in the case of CIDKeyed because of the MapCodedFont object. I'll do a test.
Comment 8 Jeremias Maerki 2010-01-21 09:48:08 UTC
I've applied your patches (documentation, too):
http://svn.apache.org/viewvc?rev=901793&view=rev
Thanks for the patch!

I've tested with referenced-fonts and that seems to do what I expect. The MapCodedFont gets generated and should work if the font and the codepage are installed on the target platform. Lacking a full AFP environment I cannot easily test that to the very end. The only thing I noticed is that the viewers don't recognize that the font is double-byte if the fonts are not embedded and not available to the editor. I've added a TODO in MapCodedFont with an idea which might help. In case you look into that class you may want to take a look.

As for the Unicode blocks we use to determine whether we have to use the em space instead of a normal space for characters which don't provide individual metrics, I think we don't have all that we need, yet. I've stumbled over http://unicode.org/reports/tr11/ which mentions fullwidth and halfwidth characters. However, I could not yet find the location where the information for the destinction for each character can be found. With the font we tested with we got away since the em space equals the normal space increment. For fonts that have western characters, we may not get away with the current set of Unicode Blocks. Not knowing enough about eastern languages is a real impediment for me here.
Comment 9 Peter Hancock 2010-01-22 01:40:08 UTC
(In reply to comment #8)
> I've applied your patches (documentation, too):
> http://svn.apache.org/viewvc?rev=901793&view=rev
> Thanks for the patch!
It is a pleasure to be able to contribute!

 
> As for the Unicode blocks we use to determine whether we have to use the em
> space instead of a normal space for characters which don't provide individual
> metrics, I think we don't have all that we need, yet. I've stumbled over
> http://unicode.org/reports/tr11/ which mentions fullwidth and halfwidth
> characters. However, I could not yet find the location where the information
> for the destinction for each character can be found. With the font we tested
> with we got away since the em space equals the normal space increment. For
> fonts that have western characters, we may not get away with the current set of
> Unicode Blocks. Not knowing enough about eastern languages is a real impediment
> for me here.
Perhaps we should make the fallback strategy external from where it is applied and perhaps configurable.  Maybe the 'char => fallback char' mapping  can be externalized in a similiar way to that of the base-14 font metrics.  Commiting improvements to this mapping - or even supporting customization of - would then be decoupled from the /src/java codebase.
Comment 10 Glenn Adams 2012-04-01 06:32:32 UTC
batch transition pre-FOP1.0 resolved+fixed bugs to closed+fixed