41671 – Linebreaking in tables

Bug 41671 - Linebreaking in tables

Summary: Linebreaking in tables

Status:	CLOSED INVALID

Alias:	None

Product:	Fop - Now in Jira
Classification:	Unclassified
Component:	page-master/layout (show other bugs)
Version:	trunk
Hardware:	Other other

Importance:	P2 regression
Target Milestone:	---
Assignee:	fop-dev

URL:
Keywords:

Depends on:
Blocks:

Reported:	2007-02-21 12:02 UTC by Andreas L. Delmelle
Modified:	2012-04-01 13:52 UTC (History)
CC List:	1 user (show)

Attachments
Example FO demonstrating the problem (3.69 KB, text/xml) 2007-02-21 12:03 UTC, Andreas L. Delmelle	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andreas L. Delmelle 2007-02-21 12:02:42 UTC

Reported by Gilles Beaugeais on fop-users:
At first glance, it seems like this could be a regression due to the UAX#14 linebreaking, but that is a wild 
guess. I'll investigate the attached sample, and run it through the debugger later, if nobody beats me to it.

I've already added the same block without hyphenation, and giving this a quick run, it seems the problem 
wrt dashes being ignored as possible linebreaks is unrelated to hyphenation settings...

Comment 1 Andreas L. Delmelle 2007-02-21 12:03:47 UTC

Created attachment 19622 [details]
Example FO demonstrating the problem

Comment 2 Manuel Mall 2007-02-21 14:19:09 UTC

Yes the changed behaviour is due to the UAX#14 changes but as far as I can tell
the new behaviour is in line with the UAX#14 spec
(http://www.unicode.org/reports/tr14/).

Rule 21 says: LB21  Do not break before hyphen-minus,....
and
Rule 25 prevents a linebreak between a hyphen followed by a digit

This means the only legal breakpoint in the text in question
'C-12-188-440/NH-000' is the forward slash which is the one FOP chooses.

You could surround the hyphen with ZWSP or use an EM DASH instead of the HYPHEN
to generate a line breaking opportunity.

I have changed the bug to INVALID but feel free to disagree.

Comment 3 Vincent Hennebert 2007-02-22 00:32:37 UTC

(In reply to comment #2)
> Yes the changed behaviour is due to the UAX#14 changes but as far as I can tell
> the new behaviour is in line with the UAX#14 spec
> (http://www.unicode.org/reports/tr14/).
> 
> Rule 21 says: LB21  Do not break before hyphen-minus,....
> and
> Rule 25 prevents a linebreak between a hyphen followed by a digit
> 
> This means the only legal breakpoint in the text in question
> 'C-12-188-440/NH-000' is the forward slash which is the one FOP chooses.
> 
> You could surround the hyphen with ZWSP or use an EM DASH instead of the HYPHEN
> to generate a line breaking opportunity.
> 
> I have changed the bug to INVALID but feel free to disagree.

I think you're right, actually, a hyphen character shouldn't be used in such a
case. That just reminds me of something I saw in a book on typographic rules,
that the proper character to use here is the en dash (U+2013), like in date
ranges (e.g., 2001-2005). I guess a break would be allowed then.
Now that UAX#14 is implemented, illegal uses of hyphens will start to strike
out. Let's get prepared to teach people about the right use of the several dash
characters: hyphen, en dash, em dash, quotation dash, etc. A new time of
high-level typography has risen...

Comment 4 Manuel Mall 2007-02-22 01:04:46 UTC

Of course 'high-level typography' doesn't really help you much if what you need
to do is generating invoices, orders, ... or the like from existing datasources,
e.g. databases, in which for example order numbers or item numbers are stored in
the good old ASCII character set using the hyphen.

Lets see if this breaking behaviour becomes a trouble spot down the track. There
is always the option, and the spec explicitly allows that for certain rules, to
make this somehow configurable.

Comment 5 Gilles Beaugeais 2007-02-22 14:44:56 UTC

Thanks for the explanations. I didn't know this rules.
(The specs appear a little 'strange' to me; the basic space 
character is used as a break, whereas the entity nbsp is used 
to keep words together but basic hyphen character is used to 
keep when numbers, whereas the entity endash is used to break. 
It is very confusing !)

It is very annoying for me, having thousands of XML files 
written with hyphens. And it is impossible to ask users to 
insert endash instead of hyphens, it is time consuming and 
endash display is different from hyphen display (with a font 
like Arial).

So could someone tell me if it is easy (and where if possible) 
to modify the source code to make the behavior of hyphen the 
same as the endash or emdash.

Thanks again for your help and your work on FOP,

Comment 6 Manuel Mall 2007-02-22 15:23:01 UTC

Its just software so anything is possible :-)

Historcally the hyphen is one of those characters which is vastly overloaded
with different meaning in different contexts. The UAX#14 spec has taken one
particular approach to its interpretation which admittedly doesn't match well in
some legacy situations, i.e. situations in which the hyphen is used in a context
different to what the Unicode standard expect it to be used in.

The quick and dirty hack (untested of course) would be to add in
org.apache.fop.text.linebreak.LineBreakStatus in the method nextChar() at the
beginning something like:

if (c == '-') c = '\u2013';

That is whenever we give a hyphen to the line breaker convert it into a EN DASH
before the line breaker deals with it.

A probably better solution, but requiring some understanding of the UAX#14 spec,
would be to change the actual pair table in
src/codegen/unicode/data/LineBreakPairTable.txt and to regenerate the java code
using the codegen-unicode ant target. For example change the cell (row HY / col
 NU) from % (indirect break opportunity) to _ (direct break opportunity). That
would allow a break between a hyphen followed directly by a numeric.

Hope this helps.

Comment 7 Gilles Beaugeais 2007-02-22 15:51:47 UTC

Thanks, I will try your ideas.

I prefer the second one too as the hyphen remains an hyphen 
instead of being converted to an endash in the first one.
And the second one applies only in table cells.

Comment 8 Manuel Mall 2007-02-22 15:56:24 UTC

No the second one does not only apply in table cells - it will apply everywhere.
In the 2nd case you are modifing the 'line breaking pairs table' not line
breaking in tables.

Comment 9 Glenn Adams 2012-04-01 13:52:56 UTC

batch transition to closed remaining pre-FOP1.0 resolved bugs