27773 – [PATCH] Hyphenation

Bug 27773 - [PATCH] Hyphenation

Summary: [PATCH] Hyphenation

Status:	CLOSED FIXED

Alias:	None

Product:	Fop - Now in Jira
Classification:	Unclassified
Component:	page-master/layout (show other bugs)
Version:	trunk
Hardware:	PC Windows XP

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	fop-dev

URL:
Keywords:

Depends on:
Blocks:

Reported:	2004-03-18 12:53 UTC by Luca Furini
Modified:	2012-04-01 06:41 UTC (History)
CC List:	0 users

Attachments
patch created using cvs diff (5.35 KB, patch) 2004-03-19 11:38 UTC, Luca Furini	Details \| Diff
FO sample file to test the behavior of fop before and after the patch (1.60 KB, text/plain) 2004-03-19 11:39 UTC, Luca Furini	Details
updated diff (6.22 KB, patch) 2004-04-05 16:41 UTC, Luca Furini	Details \| Diff
proposed patch to HyphenationTree (2.12 KB, patch) 2004-04-14 19:02 UTC, Luca Furini	Details \| Diff
test fo file: words with punctuation marks and parenthesis (1.30 KB, text/plain) 2004-04-14 19:03 UTC, Luca Furini	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luca Furini 2004-03-18 12:53:51 UTC

I have tried to solve a few problems concerning hyphenation:
- show the '-' at the end of the hyphenated lines
- use the fo:hyphenate property to enable hyphenation, instead of the alignment
- specify the hyphenation character using the fo:hyphenation-character property

With this simple .fo file it is possible to see that before changes the result 
is the opposite of what expected: hyphenation is done when it shoud not, and 
not when it shoud.

<?xml version="1.0" encoding="iso-8859-1"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
  <fo:layout-master-set>
    <fo:simple-page-master master-name="simple" margin="0.5in">
      <fo:region-body margin="1in"/>
    </fo:simple-page-master>
  </fo:layout-master-set>
  <fo:page-sequence master-reference="simple">
    <fo:flow flow-name="xsl-region-body">
      <fo:block>The following block has language="it", hyphenate="false" and 
text-align="justify"; it should NOT be hyphenated.</fo:block>
      <fo:block start-indent="0.2in" end-indent="0.2in" language="it" 
hyphenate="false" text-align="justify">Lunghissime parolone esagerevoli 
costringerebbero sillabazione auspicabilmente: precipitevolissimevolmente 
transatlantico catastrofico.</fo:block>
      <fo:block>The following block has language="it", hyphenate="true" and 
text-align="start"; it should be hyphenated.</fo:block>
      <fo:block start-indent="0.2in" end-indent="0.2in" language="it" 
hyphenate="true" text-align="start">Lunghissime parolone esagerevoli 
costringerebbero sillabazione auspicabilmente: precipitevolissimevolmente 
transatlantico catastrofico.</fo:block>
      <fo:block>The following block has language="it", hyphenate="true", text-
align="justify" and hyphenation-character="_".</fo:block>
      <fo:block start-indent="0.2in" end-indent="0.2in" language="it" 
hyphenate="true" text-align="justify" hyphenation-character="_">Lunghissime 
parolone esagerevoli costringerebbero sillabazione auspicabilmente: 
precipitevolissimevolmente transatlantico catastrofico.</fo:block>
    </fo:flow>
  </fo:page-sequence>
</fo:root>


This is the diff:

Index: xml-fop/src/java/org/apache/fop/fo/PropertyManager.java
===================================================================
RCS file: /home/cvspublic/xml-
fop/src/java/org/apache/fop/fo/PropertyManager.java,v
retrieving revision 1.24
diff -w -u -r1.24 PropertyManager.java
--- xml-fop/src/java/org/apache/fop/fo/PropertyManager.java	27 Feb 2004 
17:57:40 -0000	1.24
+++ xml-fop/src/java/org/apache/fop/fo/PropertyManager.java	18 Mar 2004 
12:43:55 -0000
@@ -494,6 +494,9 @@
             textInfo.textTransform
                     = this.propertyList.get(PR_TEXT_TRANSFORM).getEnum();
 
+            textInfo.hyphChar = this.propertyList.get(
+                                  PR_HYPHENATION_CHARACTER).getCharacter();
+
         }
         return textInfo;
     }
Index: xml-fop/src/java/org/apache/fop/fo/TextInfo.java
===================================================================
RCS file: /home/cvspublic/xml-fop/src/java/org/apache/fop/fo/TextInfo.java,v
retrieving revision 1.6
diff -w -u -r1.6 TextInfo.java
--- xml-fop/src/java/org/apache/fop/fo/TextInfo.java	27 Feb 2004 17:57:40 -
0000	1.6
+++ xml-fop/src/java/org/apache/fop/fo/TextInfo.java	18 Mar 2004 12:43:55 -
0000
@@ -50,8 +50,8 @@
     /** fo:letter-spacing property */
     public SpaceVal letterSpacing;
 
-    /** can this text be hyphenated? */
-    public boolean bCanHyphenate = true;
+    /* the hyphenation character to be used */
+    public char hyphChar = '-';
 
     /** fo:text-decoration property: is text underlined? */
     public boolean underlined = false;
Index: xml-fop/src/java/org/apache/fop/layoutmgr/LineLayoutManager.java
===================================================================
RCS file: /home/cvspublic/xml-
fop/src/java/org/apache/fop/layoutmgr/LineLayoutManager.java,v
retrieving revision 1.15
diff -w -u -r1.15 LineLayoutManager.java
--- xml-fop/src/java/org/apache/fop/layoutmgr/LineLayoutManager.java	18 Mar 
2004 00:22:40 -0000	1.15
+++ xml-fop/src/java/org/apache/fop/layoutmgr/LineLayoutManager.java	18 Mar 
2004 12:44:00 -0000
@@ -20,6 +20,7 @@
 
 import org.apache.fop.datatypes.Length;
 import org.apache.fop.fo.PropertyManager;
+import org.apache.fop.fo.Constants;
 import org.apache.fop.fo.properties.CommonMarginBlock;
 import org.apache.fop.fo.properties.CommonHyphenation;
 import org.apache.fop.layout.hyphenation.Hyphenation;
@@ -209,7 +210,7 @@
 
                     // This break position doesn't fit
                     // TODO: If we are in nowrap, we use it as is!
-                    if (bTextAlignment == TextAlign.JUSTIFY || prevBP == 
null) {
+                    if (hyphProps.hyphenate == Constants.TRUE) {
                         // If we are already in a hyphenation loop, then stop.
 
                         if (inlineLC.tryHyphenate()) {
Index: xml-fop/src/java/org/apache/fop/layoutmgr/TextLayoutManager.java
===================================================================
RCS file: /home/cvspublic/xml-
fop/src/java/org/apache/fop/layoutmgr/TextLayoutManager.java,v
retrieving revision 1.12
diff -w -u -r1.12 TextLayoutManager.java
--- xml-fop/src/java/org/apache/fop/layoutmgr/TextLayoutManager.java	17 Mar 
2004 03:37:32 -0000	1.12
+++ xml-fop/src/java/org/apache/fop/layoutmgr/TextLayoutManager.java	18 Mar 
2004 12:44:02 -0000
@@ -46,12 +46,14 @@
         private short iBreakIndex;
         private short iWScount;
         private MinOptMax ipdArea;
+        private boolean bHyphenated;
         public AreaInfo(short iSIndex, short iBIndex, short iWS,
-                 MinOptMax ipd) {
+                 MinOptMax ipd, boolean bHyph) {
             iStartIndex = iSIndex;
             iBreakIndex = iBIndex;
             iWScount = iWS;
             ipdArea = ipd;
+            bHyphenated = bHyph;
         }
     }
 
@@ -107,7 +109,7 @@
         // With CID fonts, space isn't neccesary currentFontState.width(32)
         spaceCharIPD = foText.textInfo.fs.getCharWidth(' ');
         // Use hyphenationChar property
-        hyphIPD = foText.textInfo.fs.getCharWidth('-');
+        hyphIPD = foText.textInfo.fs.getCharWidth(foText.textInfo.hyphChar);
         // Make half-space: <space> on either side of a word-space)
         SpaceVal ws = foText.textInfo.wordSpacing;
         halfWS = new SpaceVal(MinOptMax.multiply(ws.getSpace(), 0.5),
@@ -201,7 +203,7 @@
         boolean bCanHyphenate = true;
         int iStopIndex = iNextStart + hc.getNextHyphPoint();
 
-        if (textArray.length < iStopIndex || foText.textInfo.bCanHyphenate == 
false) {
+        if (textArray.length < iStopIndex) {
             iStopIndex = textArray.length;
             bCanHyphenate = false;
         }
@@ -396,7 +398,8 @@
 
         // Position is the index of the info for this word in the vector
         vecAreaInfo.add(
-          new AreaInfo(iWordStart, iNextStart, iWScount, ipd));
+          new AreaInfo(iWordStart, iNextStart, iWScount, ipd, 
+                       ((flags & BreakPoss.HYPHENATED) != 0)));
         BreakPoss bp = new BreakPoss(
                          new LeafPosition(this, vecAreaInfo.size() - 1));
         ipdTotal = ipd;
@@ -493,6 +496,11 @@
             adjust = 1;
         }
         String str = new String(textArray, iStart, ai.iBreakIndex - iStart - 
adjust);
+
+        // add hyphenation character if the last word is hyphenated
+        if (ai.bHyphenated) {
+            str += foText.textInfo.hyphChar;
+        }
 
         if (" ".equals(str)) {
             word = new Space();

Comment 1 Luca Furini 2004-03-19 11:38:03 UTC

Created attachment 10868 [details]
patch created using cvs diff

Comment 2 Luca Furini 2004-03-19 11:39:59 UTC

Created attachment 10869 [details]
FO sample file to test the behavior of fop before and after the patch

Comment 3 Luca Furini 2004-04-05 09:22:32 UTC

Hi all

I apologize for having posted some proposed changes without a line explaining 
what I was changing and why.
Really sorry, I'm going to try and do it now.

First of all, the condition now tested in the 
LineLayoutManager.getNextBreakPoss() to decide whether or not to try 
hyphenation is:
  if (bTextAlignment == TextAlign.JUSTIFY || prevBP == null) {
      ...
  }
So hyphenation is tried for *every* justified block of text; only if 
the "language" attribute is not set the getHyphenContext() method returns null 
and hyphenation is not actually done.
My suggestion is to change it so that the fo property hyphenate (7.9.4 in the 
recommendation, and already implemented in the CommonHyphenation class) is 
tested instead:
  if (hyphProps.hyphenate == Constants.TRUE) {
      ...
  }
In the TextInfo class there is a boolean attribute called "bCanHyphenate", but 
it is given a default value of true, and it is no more modified. It seems to 
me that it is quite useless.
Is is used in the TextLayoutManager.getHyphenIPD() method:
  if (textArray.length < iStopIndex || foText.textInfo.bCanHyphenate == false) 
{
      ...
  }
but this method is called only if context.tryHyphenate() is true; with my 
suggested change, context.tryHyphenate() returns true only if the fo property 
hyphenate is true, and so it is no more necessary to test 
foText.textInfo.bCanHyphenate.

The second problem is that the hyphenation character is not shown: to fix this 
I added in the AreaInfo class (which stores information about the area that 
will be generated) a boolean attribute "bHyphenated".
Is is set according to the value of the flags of the corresponding BreakPoss, 
and it is tested in the TextLayoutManager.addAreas() method to decide whether 
to add the hyphen.

Last (and least): I tried to implement the fo property hyphenation-character 
(7.9.5 in the recommendation).
As it applies to blocks, I thought to store it as a character variable in the 
TextInfo class, with initial value "-".
It is set in the PropertyManager.getTextLayoutProps() method, together with 
the other TextInfo attribute.
Is is used to calculate the hyphIPD in the TextLayoutManager constructor, and 
in the addAreas method.

Bye
    Luca

Comment 4 Luca Furini 2004-04-05 16:40:13 UTC

Sorry, I'm here again (hope I am not making too much a mess!) :-)

I'm going to attach an updated proposed patch, as I noticed there was an error 
in the one I posted before.
In the TextLayoutManager.addAreas() method, the hyphen size must be added to 
the total inline progression dimension of the line *before* the adjustment 
calculation; this involves also moving a few lines of code.

Bye
    Luca

Comment 5 Luca Furini 2004-04-05 16:41:56 UTC

Created attachment 11138 [details]
updated diff

Comment 6 Simon Pepping 2004-04-09 09:28:43 UTC

Hi Luca,

Your patch looks good. It works fine on your test fo, and it also
works fine on a test fo I had lying around.

There are a few things to note:

AreaInfo.bHyphenated is identical to (flags & BreakPoss.HYPHENATED);
ai.ipdArea = MinOptMax.add(ai.ipdArea, new MinOptMax(hyphIPD)) is
equivalent to bp.setStackingSize(MinOptMax.add(ipd, new
MinOptMax(hyphIPD))). In other words, the same information is present
in the BreakPoss, which is held by the LineLM. Unfortunately, this
information is not accessible in the addAreas method of TextLM. This
is so because posIter.next() returns the position member of the
BreakPoss, not the BreakPoss itself. I believe that is an error in
PositionIterator which eventually needs correction. For now your
solution is a good one.

Your proposed TextInfo.hypChar is also known as
LineLM.hyphProps.hyphenationChar. This information is not passed on to
childLMs and thus not known to TextLM. Storing it in TextInfo seems a
good idea; perhaps it is not even needed in hyphProps.

It is not completely clear that 

+        if (ai.bHyphenated) {
+            str += foText.textInfo.hyphChar;
+            ai.ipdArea = MinOptMax.add(ai.ipdArea, new MinOptMax(hyphIPD));
+        }

is always correct. This happens on the last AI returned by
posIter.next(), which is not necessarily the last one in the line. But
it is hard to conceive of an AI which is last in its LM and also
hyphenated; therefore it will work. Ideally, the LineLM indicates
which BP is the one ending the line; perhaps this needs correction at
some time.

Regards, Simon

Comment 7 Luca Furini 2004-04-14 19:00:09 UTC

Yes, I was quite surprised to see that all the information stored in the 
BreakPoss was "thrown away" before adding areas; I chose to duplicate the 
needed values because this involved fewer and less radical changes.

I have found another small bug concerning hyphenation in the 
HyphenationTree.hyphenate() method.
Before checking the exception list or using the algorithm, the 
function "normalizes" the word: during this phase, if a non-letter character 
is found null is returned.
    // normalize word
    char[] c = new char[2];
    for (i = 1; i <= len; i++) {
        c[0] = w[offset + i - 1];
        int nc = classmap.find(c, 0);
        if (nc < 0) {    // found a non-letter character, abort
            return null;
        }
        word[i] = (char)nc;
    }
I think the condition (nc < 0) is too strong: at the moment words followed by 
punctuation marks, or in parenthesis, are not hyphenated.

This is how I tried to fix this problem:
- non-letter characters at the beginning are not copied into word[]
- if a non-letter character is found which is not at the beginning, it is not 
copied into word[] and a boolean variable becomes true
- if a letter-character is found when the variable is true, null is returned; 
otherwise, word[] is used to find hyphenation points

I have also added a little optimization: if, after the normalization and the 
non-letter character removal, the word size is less than (remainCharCount + 
pushCharCount), null is returned, without checking the exception list and 
performing the algorithm.

I'm going to attach the proposed patch and a test fo file which shows a few 
examples.

Regards

    Luca

Comment 8 Luca Furini 2004-04-14 19:02:13 UTC

Created attachment 11237 [details]
proposed patch to HyphenationTree

Comment 9 Luca Furini 2004-04-14 19:03:27 UTC

Created attachment 11238 [details]
test fo file: words with punctuation marks and parenthesis

Comment 10 Simon Pepping 2004-05-29 08:56:23 UTC

Applied. Thanks.

Simon

Comment 11 Glenn Adams 2012-04-01 06:41:12 UTC

batch transition pre-FOP1.0 resolved+fixed bugs to closed+fixed