Bug 48733 - Extend Line Break algorithm to support non-standard configuration
Summary: Extend Line Break algorithm to support non-standard configuration
Status: CLOSED INVALID
Alias: None
Product: Fop - Now in Jira
Classification: Unclassified
Component: general (show other bugs)
Version: 0.95
Hardware: PC Windows XP
: P2 enhancement
Target Milestone: ---
Assignee: fop-dev
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-02-12 04:21 UTC by Alex Watson
Modified: 2012-04-01 13:55 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Watson 2010-02-12 04:21:13 UTC
The current line breaking algorithm in FOP 0.95 uses two static classes LineBreakStatus and LineBreakUtils to implement the standard Unicode line breaking algorithm (UTR 14).

In particular, this handles currency expressions such as $US150 and $A250 to prevent line breaks within each number.

However, my company uses a different convention of putting the country before the currency symbol (eg. US$150 and A$250). Unfortunately, this allows a line break between the country and the currency symbol.

I have checked the UTR14 spec and I know this is non-standard. It prohibits breaks between (PR,AL) but allows breaks between (AL,PR).

My enhancement request is for a way to override the default line-break pairs at runtime. This is not currently possible because the LineBreakStatus and LineBreakUtils classes are static with private members.

The simplest change would be to make the LineBreakUtils class public, change the PAIR_TABLE to non-final and then add a method setLineBreakPairProperty(int, int, byte). This would not break any existing unittests or functionality, but would allow me to write something like this when I create my FopFactory:

LineBreakUtils.setLineBreakPairProperty(
  LineBreakUtils.LINE_BREAK_PROPERTY_AL,
  LineBreakUtils.LINE_BREAK_PROPERTY_PR,
  LineBreakUtils.PROHIBITED_BREAK);

However, as LineBreakUtils is a generated java file I have an alternate patch for LineBreakStatus that would achieve the same thing (but is a little more work).

Please advise if there is a way to achieve this with current FOP functionality, or if my suggested code enhancements are appropriate. I can provide PATCH code for either approach if requested.
Comment 1 Manuel Mall 2010-02-12 06:50:00 UTC
(In reply to comment #0)
> 
> However, my company uses a different convention of putting the country before
> the currency symbol (eg. US$150 and A$250). Unfortunately, this allows a line
> break between the country and the currency symbol.
>
The usual solution for special cases requiring the standard linebreaking to be overwritten is to add a zero width breaking or non breaking (depending on what you want to achieve) joiner character to the input stream. Typically as part of the XSL transformation step if you have one.
> 
> The simplest change would be to make the LineBreakUtils class public, change
> the PAIR_TABLE to non-final and then add a method setLineBreakPairProperty(int,
> int, byte). This would not break any existing unittests or functionality, but
> would allow me to write something like this when I create my FopFactory:
> 
> LineBreakUtils.setLineBreakPairProperty(
>   LineBreakUtils.LINE_BREAK_PROPERTY_AL,
>   LineBreakUtils.LINE_BREAK_PROPERTY_PR,
>   LineBreakUtils.PROHIBITED_BREAK);
> 
While it is desirable to make line breaking behaviour more flexible your approach of exposing setters on static variables making them modifiable at runtime is non thread safe and also does not allow linebreaking customization on a per fop invocation basis as obviously these tables are shared across all fop instances.

A more generic approach possibly using interfaces, factories and configurable line breaking providers may be more appropriate. However, as pointed out above this may be an overkill in your case as, depending on your FOP production pipeline, injecting non breaking word joiner (U+2060) into your currency amounts could be a simple solution not requiring any FOP code changes.
Comment 2 Vincent Hennebert 2010-02-12 10:39:10 UTC
Hi,

Like Manuel said, the appropriate way to deal with such special case is to insert Word Joiner (U+2060) or otherwise Zero Width Space (U+200B) at the right places in the text. In your case: US⁠$150, A⁠$250. Have a look at chapter 16 of the Unicode Standard 5.2 (http://www.unicode.org/versions/Unicode5.2.0/). This is much more flexible as that allows you to alter the algorithm's behaviour only locally; it's also more practical than modifying FOP's source code.

If you have any further question feel free to ask on the fop-users list:
http://xmlgraphics.apache.org/fop/maillist.html#fop-user

HTH,
Vincent
Comment 3 Glenn Adams 2012-04-01 13:55:06 UTC
batch transition to closed remaining pre-FOP1.0 resolved bugs