Bug 53089 - Hyphenation of Uppercase Words, Combined with Underlines
Summary: Hyphenation of Uppercase Words, Combined with Underlines
Status: NEW
Alias: None
Product: Fop - Now in Jira
Classification: Unclassified
Component: general (show other bugs)
Version: 1.0
Hardware: All All
: P3 enhancement
Target Milestone: ---
Assignee: fop-dev
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-17 07:39 UTC by Thomas Schraitle
Modified: 2012-04-21 04:53 UTC (History)
0 users



Attachments
FO file showing hyphenation issue with uppercase word(s) (1.43 KB, application/xml)
2012-04-17 07:39 UTC, Thomas Schraitle
Details
PDF output from FO file of attachment#28621 (5.76 KB, application/pdf)
2012-04-18 13:53 UTC, Thomas Schraitle
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Schraitle 2012-04-17 07:39:29 UTC
Created attachment 28621 [details]
FO file showing hyphenation issue with uppercase word(s)

Consider the attached FO file which combines words of lowercase and uppercase letters.

As it is expected, the word "expected" is hyphenated correctly (example 2). Also the uppercase "SUCCESS". Even combined with underlines before and after the word (see example 4 and 5).


However, if there is another word (like OCF_SUCCESS) the word isn't hyphenated at all anymore. I don't know if this is an expected behaviour or an issue in the hyphenation patterns. Interestingly, XEP from RenderX hyphenates it as "OCF_SUC-CESS". As far as I know, they use also the TeX hyphenation patterns as FOP.
Comment 1 Glenn Adams 2012-04-17 16:26:59 UTC
please provide a PDF output file that shows the results you are seeing
Comment 2 Glenn Adams 2012-04-17 16:54:32 UTC
the problem is in o.a.f.hyphenation.HyphenationTree#hyphenate, specifically in:

                if (!bEndOfLetters) {
                    word[i - iIgnoreAtBeginning] = (char)nc;
                } else {
                    return null;
                }

when '_' is encountered after a letter (as opposed to beginning of word), bEndOfLetters is set to true, which causes the hyphenate algorithm to bail out;

a better approach would be to divide the input word into segments separated by non-letter characters, hyphenate each segment separately, then collect and return the union of hyphenation points from these segments

would anyone like to submit a patch?
Comment 3 Thomas Schraitle 2012-04-18 13:53:06 UTC
Created attachment 28635 [details]
PDF output from FO file of attachment#28621 [details]
Comment 4 Thomas Schraitle 2012-04-18 13:54:00 UTC
(In reply to comment #1)
> please provide a PDF output file that shows the results you are seeing

See Attachement#28635
https://issues.apache.org/bugzilla/attachment.cgi?id=28635
Comment 5 Glenn Adams 2012-04-21 04:53:15 UTC
marking this as an enhancement rather than a bug, since XSL-FO does not prescribe hyphenation behavior