I have tried to solve a few problems concerning hyphenation: - show the '-' at the end of the hyphenated lines - use the fo:hyphenate property to enable hyphenation, instead of the alignment - specify the hyphenation character using the fo:hyphenation-character property With this simple .fo file it is possible to see that before changes the result is the opposite of what expected: hyphenation is done when it shoud not, and not when it shoud. <?xml version="1.0" encoding="iso-8859-1"?> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <fo:simple-page-master master-name="simple" margin="0.5in"> <fo:region-body margin="1in"/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="simple"> <fo:flow flow-name="xsl-region-body"> <fo:block>The following block has language="it", hyphenate="false" and text-align="justify"; it should NOT be hyphenated.</fo:block> <fo:block start-indent="0.2in" end-indent="0.2in" language="it" hyphenate="false" text-align="justify">Lunghissime parolone esagerevoli costringerebbero sillabazione auspicabilmente: precipitevolissimevolmente transatlantico catastrofico.</fo:block> <fo:block>The following block has language="it", hyphenate="true" and text-align="start"; it should be hyphenated.</fo:block> <fo:block start-indent="0.2in" end-indent="0.2in" language="it" hyphenate="true" text-align="start">Lunghissime parolone esagerevoli costringerebbero sillabazione auspicabilmente: precipitevolissimevolmente transatlantico catastrofico.</fo:block> <fo:block>The following block has language="it", hyphenate="true", text- align="justify" and hyphenation-character="_".</fo:block> <fo:block start-indent="0.2in" end-indent="0.2in" language="it" hyphenate="true" text-align="justify" hyphenation-character="_">Lunghissime parolone esagerevoli costringerebbero sillabazione auspicabilmente: precipitevolissimevolmente transatlantico catastrofico.</fo:block> </fo:flow> </fo:page-sequence> </fo:root> This is the diff: Index: xml-fop/src/java/org/apache/fop/fo/PropertyManager.java =================================================================== RCS file: /home/cvspublic/xml- fop/src/java/org/apache/fop/fo/PropertyManager.java,v retrieving revision 1.24 diff -w -u -r1.24 PropertyManager.java --- xml-fop/src/java/org/apache/fop/fo/PropertyManager.java 27 Feb 2004 17:57:40 -0000 1.24 +++ xml-fop/src/java/org/apache/fop/fo/PropertyManager.java 18 Mar 2004 12:43:55 -0000 @@ -494,6 +494,9 @@ textInfo.textTransform = this.propertyList.get(PR_TEXT_TRANSFORM).getEnum(); + textInfo.hyphChar = this.propertyList.get( + PR_HYPHENATION_CHARACTER).getCharacter(); + } return textInfo; } Index: xml-fop/src/java/org/apache/fop/fo/TextInfo.java =================================================================== RCS file: /home/cvspublic/xml-fop/src/java/org/apache/fop/fo/TextInfo.java,v retrieving revision 1.6 diff -w -u -r1.6 TextInfo.java --- xml-fop/src/java/org/apache/fop/fo/TextInfo.java 27 Feb 2004 17:57:40 - 0000 1.6 +++ xml-fop/src/java/org/apache/fop/fo/TextInfo.java 18 Mar 2004 12:43:55 - 0000 @@ -50,8 +50,8 @@ /** fo:letter-spacing property */ public SpaceVal letterSpacing; - /** can this text be hyphenated? */ - public boolean bCanHyphenate = true; + /* the hyphenation character to be used */ + public char hyphChar = '-'; /** fo:text-decoration property: is text underlined? */ public boolean underlined = false; Index: xml-fop/src/java/org/apache/fop/layoutmgr/LineLayoutManager.java =================================================================== RCS file: /home/cvspublic/xml- fop/src/java/org/apache/fop/layoutmgr/LineLayoutManager.java,v retrieving revision 1.15 diff -w -u -r1.15 LineLayoutManager.java --- xml-fop/src/java/org/apache/fop/layoutmgr/LineLayoutManager.java 18 Mar 2004 00:22:40 -0000 1.15 +++ xml-fop/src/java/org/apache/fop/layoutmgr/LineLayoutManager.java 18 Mar 2004 12:44:00 -0000 @@ -20,6 +20,7 @@ import org.apache.fop.datatypes.Length; import org.apache.fop.fo.PropertyManager; +import org.apache.fop.fo.Constants; import org.apache.fop.fo.properties.CommonMarginBlock; import org.apache.fop.fo.properties.CommonHyphenation; import org.apache.fop.layout.hyphenation.Hyphenation; @@ -209,7 +210,7 @@ // This break position doesn't fit // TODO: If we are in nowrap, we use it as is! - if (bTextAlignment == TextAlign.JUSTIFY || prevBP == null) { + if (hyphProps.hyphenate == Constants.TRUE) { // If we are already in a hyphenation loop, then stop. if (inlineLC.tryHyphenate()) { Index: xml-fop/src/java/org/apache/fop/layoutmgr/TextLayoutManager.java =================================================================== RCS file: /home/cvspublic/xml- fop/src/java/org/apache/fop/layoutmgr/TextLayoutManager.java,v retrieving revision 1.12 diff -w -u -r1.12 TextLayoutManager.java --- xml-fop/src/java/org/apache/fop/layoutmgr/TextLayoutManager.java 17 Mar 2004 03:37:32 -0000 1.12 +++ xml-fop/src/java/org/apache/fop/layoutmgr/TextLayoutManager.java 18 Mar 2004 12:44:02 -0000 @@ -46,12 +46,14 @@ private short iBreakIndex; private short iWScount; private MinOptMax ipdArea; + private boolean bHyphenated; public AreaInfo(short iSIndex, short iBIndex, short iWS, - MinOptMax ipd) { + MinOptMax ipd, boolean bHyph) { iStartIndex = iSIndex; iBreakIndex = iBIndex; iWScount = iWS; ipdArea = ipd; + bHyphenated = bHyph; } } @@ -107,7 +109,7 @@ // With CID fonts, space isn't neccesary currentFontState.width(32) spaceCharIPD = foText.textInfo.fs.getCharWidth(' '); // Use hyphenationChar property - hyphIPD = foText.textInfo.fs.getCharWidth('-'); + hyphIPD = foText.textInfo.fs.getCharWidth(foText.textInfo.hyphChar); // Make half-space: <space> on either side of a word-space) SpaceVal ws = foText.textInfo.wordSpacing; halfWS = new SpaceVal(MinOptMax.multiply(ws.getSpace(), 0.5), @@ -201,7 +203,7 @@ boolean bCanHyphenate = true; int iStopIndex = iNextStart + hc.getNextHyphPoint(); - if (textArray.length < iStopIndex || foText.textInfo.bCanHyphenate == false) { + if (textArray.length < iStopIndex) { iStopIndex = textArray.length; bCanHyphenate = false; } @@ -396,7 +398,8 @@ // Position is the index of the info for this word in the vector vecAreaInfo.add( - new AreaInfo(iWordStart, iNextStart, iWScount, ipd)); + new AreaInfo(iWordStart, iNextStart, iWScount, ipd, + ((flags & BreakPoss.HYPHENATED) != 0))); BreakPoss bp = new BreakPoss( new LeafPosition(this, vecAreaInfo.size() - 1)); ipdTotal = ipd; @@ -493,6 +496,11 @@ adjust = 1; } String str = new String(textArray, iStart, ai.iBreakIndex - iStart - adjust); + + // add hyphenation character if the last word is hyphenated + if (ai.bHyphenated) { + str += foText.textInfo.hyphChar; + } if (" ".equals(str)) { word = new Space();
Created attachment 10868 [details] patch created using cvs diff
Created attachment 10869 [details] FO sample file to test the behavior of fop before and after the patch
Hi all I apologize for having posted some proposed changes without a line explaining what I was changing and why. Really sorry, I'm going to try and do it now. First of all, the condition now tested in the LineLayoutManager.getNextBreakPoss() to decide whether or not to try hyphenation is: if (bTextAlignment == TextAlign.JUSTIFY || prevBP == null) { ... } So hyphenation is tried for *every* justified block of text; only if the "language" attribute is not set the getHyphenContext() method returns null and hyphenation is not actually done. My suggestion is to change it so that the fo property hyphenate (7.9.4 in the recommendation, and already implemented in the CommonHyphenation class) is tested instead: if (hyphProps.hyphenate == Constants.TRUE) { ... } In the TextInfo class there is a boolean attribute called "bCanHyphenate", but it is given a default value of true, and it is no more modified. It seems to me that it is quite useless. Is is used in the TextLayoutManager.getHyphenIPD() method: if (textArray.length < iStopIndex || foText.textInfo.bCanHyphenate == false) { ... } but this method is called only if context.tryHyphenate() is true; with my suggested change, context.tryHyphenate() returns true only if the fo property hyphenate is true, and so it is no more necessary to test foText.textInfo.bCanHyphenate. The second problem is that the hyphenation character is not shown: to fix this I added in the AreaInfo class (which stores information about the area that will be generated) a boolean attribute "bHyphenated". Is is set according to the value of the flags of the corresponding BreakPoss, and it is tested in the TextLayoutManager.addAreas() method to decide whether to add the hyphen. Last (and least): I tried to implement the fo property hyphenation-character (7.9.5 in the recommendation). As it applies to blocks, I thought to store it as a character variable in the TextInfo class, with initial value "-". It is set in the PropertyManager.getTextLayoutProps() method, together with the other TextInfo attribute. Is is used to calculate the hyphIPD in the TextLayoutManager constructor, and in the addAreas method. Bye Luca
Sorry, I'm here again (hope I am not making too much a mess!) :-) I'm going to attach an updated proposed patch, as I noticed there was an error in the one I posted before. In the TextLayoutManager.addAreas() method, the hyphen size must be added to the total inline progression dimension of the line *before* the adjustment calculation; this involves also moving a few lines of code. Bye Luca
Created attachment 11138 [details] updated diff
Hi Luca, Your patch looks good. It works fine on your test fo, and it also works fine on a test fo I had lying around. There are a few things to note: AreaInfo.bHyphenated is identical to (flags & BreakPoss.HYPHENATED); ai.ipdArea = MinOptMax.add(ai.ipdArea, new MinOptMax(hyphIPD)) is equivalent to bp.setStackingSize(MinOptMax.add(ipd, new MinOptMax(hyphIPD))). In other words, the same information is present in the BreakPoss, which is held by the LineLM. Unfortunately, this information is not accessible in the addAreas method of TextLM. This is so because posIter.next() returns the position member of the BreakPoss, not the BreakPoss itself. I believe that is an error in PositionIterator which eventually needs correction. For now your solution is a good one. Your proposed TextInfo.hypChar is also known as LineLM.hyphProps.hyphenationChar. This information is not passed on to childLMs and thus not known to TextLM. Storing it in TextInfo seems a good idea; perhaps it is not even needed in hyphProps. It is not completely clear that + if (ai.bHyphenated) { + str += foText.textInfo.hyphChar; + ai.ipdArea = MinOptMax.add(ai.ipdArea, new MinOptMax(hyphIPD)); + } is always correct. This happens on the last AI returned by posIter.next(), which is not necessarily the last one in the line. But it is hard to conceive of an AI which is last in its LM and also hyphenated; therefore it will work. Ideally, the LineLM indicates which BP is the one ending the line; perhaps this needs correction at some time. Regards, Simon
Yes, I was quite surprised to see that all the information stored in the BreakPoss was "thrown away" before adding areas; I chose to duplicate the needed values because this involved fewer and less radical changes. I have found another small bug concerning hyphenation in the HyphenationTree.hyphenate() method. Before checking the exception list or using the algorithm, the function "normalizes" the word: during this phase, if a non-letter character is found null is returned. // normalize word char[] c = new char[2]; for (i = 1; i <= len; i++) { c[0] = w[offset + i - 1]; int nc = classmap.find(c, 0); if (nc < 0) { // found a non-letter character, abort return null; } word[i] = (char)nc; } I think the condition (nc < 0) is too strong: at the moment words followed by punctuation marks, or in parenthesis, are not hyphenated. This is how I tried to fix this problem: - non-letter characters at the beginning are not copied into word[] - if a non-letter character is found which is not at the beginning, it is not copied into word[] and a boolean variable becomes true - if a letter-character is found when the variable is true, null is returned; otherwise, word[] is used to find hyphenation points I have also added a little optimization: if, after the normalization and the non-letter character removal, the word size is less than (remainCharCount + pushCharCount), null is returned, without checking the exception list and performing the algorithm. I'm going to attach the proposed patch and a test fo file which shows a few examples. Regards Luca
Created attachment 11237 [details] proposed patch to HyphenationTree
Created attachment 11238 [details] test fo file: words with punctuation marks and parenthesis
Applied. Thanks. Simon
batch transition pre-FOP1.0 resolved+fixed bugs to closed+fixed