29124 – New line breaking algorithm

Bug 29124 - New line breaking algorithm

Summary: New line breaking algorithm

Status:	CLOSED FIXED

Alias:	None

Product:	Fop - Now in Jira
Classification:	Unclassified
Component:	general (show other bugs)
Version:	trunk
Hardware:	Other other

Importance:	P3 normal
Target Milestone:	---
Assignee:	fop-dev

URL:
Keywords:

Depends on:
Blocks:

Reported:	2004-05-20 15:30 UTC by Luca Furini
Modified:	2012-04-01 06:27 UTC (History)
CC List:	0 users

Attachments
patch to existing files (86.44 KB, patch) 2004-05-20 15:31 UTC, Luca Furini	Details \| Diff
KnuthElement.java (1.21 KB, patch) 2004-05-20 15:31 UTC, Luca Furini	Details \| Diff
KnuthBox.java (221 bytes, patch) 2004-05-20 15:32 UTC, Luca Furini	Details \| Diff
KnuthGlue.java (534 bytes, patch) 2004-05-20 15:32 UTC, Luca Furini	Details \| Diff
KnuthPenalty.java (496 bytes, patch) 2004-05-20 15:32 UTC, Luca Furini	Details \| Diff
KnuthPossPosIter.java (1.88 KB, patch) 2004-05-20 15:33 UTC, Luca Furini	Details \| Diff
fo test file (text-indent) (1.27 KB, text/plain) 2004-05-20 15:34 UTC, Luca Furini	Details
fo test file (text-align-last) (1.88 KB, text/plain) 2004-05-20 15:34 UTC, Luca Furini	Details
fo test file (word-spacing and letter-spacing) (2.25 KB, text/plain) 2004-05-20 15:35 UTC, Luca Furini	Details
fo test file (long paragraphs of text) (3.66 KB, text/plain) 2004-05-20 15:36 UTC, Luca Furini	Details
LineLayoutManager.java (53.02 KB, patch) 2004-05-21 17:21 UTC, Luca Furini	Details \| Diff
TextLayoutManager.java (36.30 KB, patch) 2004-05-21 17:22 UTC, Luca Furini	Details \| Diff
patch (second edition) (128.02 KB, patch) 2004-08-03 12:45 UTC, Luca Furini	Details \| Diff
test fo file (10.46 KB, text/plain) 2004-08-03 12:45 UTC, Luca Furini	Details
patch file (third edition, minor oversight fixed) (128.01 KB, patch) 2004-08-05 16:11 UTC, Luca Furini	Details \| Diff
patch file (fourth edition, including HyphContext and bug fixes) (128.95 KB, patch) 2004-08-12 08:25 UTC, Luca Furini	Details \| Diff
patch - existing files (fifth edition) (122.81 KB, patch) 2004-08-19 16:36 UTC, Luca Furini	Details \| Diff
patch - new files (fifth edition) (5.04 KB, patch) 2004-08-19 16:36 UTC, Luca Furini	Details \| Diff
test fo: Nested inline and other LMs (6.09 KB, text/xml) 2004-08-19 19:32 UTC, Simon Pepping	Details
test fo: Justification (4.56 KB, text/xml) 2004-08-19 19:32 UTC, Simon Pepping	Details
test fo: No breakpoints (889 bytes, text/xml) 2004-08-19 19:33 UTC, Simon Pepping	Details
patch to existing files (sixth edition) (125.35 KB, patch) 2004-08-24 15:25 UTC, Luca Furini	Details \| Diff
patch to existing files (version 7.1) (138.89 KB, patch) 2004-08-28 10:09 UTC, Luca Furini	Details \| Diff
patch to existing files (tested) (128.42 KB, patch) 2004-08-30 10:29 UTC, Luca Furini	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luca Furini 2004-05-20 15:30:12 UTC

Ok, so here is the patch I was talking about in the fop-dev mailing list.
I'm going to attach:
- the patch concerning existing files
- 5 new .java files (diff did not consider them)
- a few test fo files showing some properties (text-indent, text-align-last, 
word-spacing & letter-spacing)
- an fo file with some long paragraphs, which can be used to make space width 
comparison; for example, HEAD and xep have very large spaces in some lines of 
the first paragraph.

I try to summarize briefly the algorithm, but I'm not so sure I can be both 
clear and exhaustive :-) 

The first step in the algorithm is converting the paragraph into a sequence of 
elements; there are 3 kind of elements:
- "box" elements, representing words or whatever else; a box has a width
- "glue" elements, representing adjustable spaces between boxes; a glue has an 
optimal width, a potential stretch and a potential shrink, i.e. its length is 
in the range [width - shrink, width + stretch]
- "penalty" items represents positions in which a line could be broken; a 
penalty has a width (i.e. the "-" width when hyphenating), a penalty value and 
a boolean; the penalty value is a kind of "aesthetic cost", and it could be a 
finite number, +infinite (where there can't be a break) or -infinite (a forced 
break); penalties with value = 0 can be omitted

For example, the text "life is beautiful" will generate the sequence:

 box - glue - box - glue - box - penalty - box - penalty - box
  ^            ^            ^       ^       ^       ^       ^
  |            |            |       |       |       |       |
 life         is           beau     -       ti      -      ful

Then, the algorithm finds the breaking points:
For each element in the paragraph, it verifies if this element is a "feasible 
break", i.e. there is a previous "active" feasible break such that the 
elements between the two break can fill a line, using the available stretch 
and shrink.
If the element is a feasible break, the algorithm computes a "score" depending 
on the needed adjustment and the penalty (the lower is the score, the better 
is the break), and create a structure representing this element as an active 
feasible break.
Note that the number of active feasible break at each moment is quite small, 
as feasible breaks are deactivated if they are too close to the new one.

Some implementation notes:
1)
In order to minimize changes in other files, the LLM interacts with the TLM 
calling brand new methods that I added to the LayoutManager interface, 
providing null implementation in the AbstractLayoutManager. I also added a 
KnuthPossPosIter, which is almost a clone of BreakPossPosIter.
I think a much more elegant solution would be having the LLM call 
getNextBreakPoss, and casting the returned object to KnuthElement. Maybe 
BreakPoss and KnuthElement could be subclasses of a same class, abstract, 
containing only a Position object and the method to access it:

        BreakPoss (same name, but a different class)
            |
     ---------------
     |             |
 KnuthElement    TheClassPreviouslyKnownAsBreakPoss

All LM call getNextBreakPoss, but the LLM casts to KnuthElement and the other 
LM cast to "TheClassPreviouslyKnownAsBreakPoss"

2)
The letter space is at the moment set to .min = .opt and .max = .opt as there 
are a few problems with adjustable letter space (Knuth's original algorithm 
does not handle letter space); I also have some problems in counting the 
letter spaces in a hyphenated word.

3)
The LLM tries the first time to find breaking point without word hyphenation; 
only if it doesn't find a set of break points, it calls again the algorithm 
after having hyphenated all words, so this patch includes the ones concerning 
hyphenation (bug 27773) and hyphenation of word with punctuation marks (bug 
28431).
I tried to use the old information (about the whole words) as much as 
possible, but I don't know whether this saves some time or it is just an 
unneeded complication.

4)
Concerning word and letter spacing, one problem is that at the moment I see no 
way to decide whether the fo property letter-spacing (or word-spacing) was set 
to 0 (and fop should respect this value), or it wasn't set and 0 is its 
default value (and fop could modify it in order to justify).

Comments, suggestions, etc. are most welcome!

Regards
    Luca

Comment 1 Luca Furini 2004-05-20 15:31:18 UTC

Created attachment 11602 [details]
patch to existing files

Comment 2 Luca Furini 2004-05-20 15:31:47 UTC

Created attachment 11603 [details]
KnuthElement.java

Comment 3 Luca Furini 2004-05-20 15:32:15 UTC

Created attachment 11604 [details]
KnuthBox.java

Comment 4 Luca Furini 2004-05-20 15:32:38 UTC

Created attachment 11605 [details]
KnuthGlue.java

Comment 5 Luca Furini 2004-05-20 15:32:58 UTC

Created attachment 11606 [details]
KnuthPenalty.java

Comment 6 Luca Furini 2004-05-20 15:33:41 UTC

Created attachment 11607 [details]
KnuthPossPosIter.java

Comment 7 Luca Furini 2004-05-20 15:34:31 UTC

Created attachment 11608 [details]
fo test file (text-indent)

Comment 8 Luca Furini 2004-05-20 15:34:55 UTC

Created attachment 11609 [details]
fo test file (text-align-last)

Comment 9 Luca Furini 2004-05-20 15:35:30 UTC

Created attachment 11610 [details]
fo test file (word-spacing and letter-spacing)

Comment 10 Luca Furini 2004-05-20 15:36:01 UTC

Created attachment 11611 [details]
fo test file (long paragraphs of text)

Comment 11 Luca Furini 2004-05-21 17:21:41 UTC

Created attachment 11625 [details]
LineLayoutManager.java

Comment 12 Luca Furini 2004-05-21 17:22:17 UTC

Created attachment 11626 [details]
TextLayoutManager.java

Comment 13 Luca Furini 2004-08-03 12:41:51 UTC

Hi all

At long last, I have finished the patch implementing Knuth's line breaking 
algorithm; it took me more than I expected, mainly because of a long sequence 
of hw and sw troubles ... Murphy's laws are not something to laugh at! :-)

I have worked on [Line, Text, InlineStacking, LeafNode]LM, so the algorithm 
should work well with any fo file containing text, leaders, characters, 
inlines and the other formatting objects handled by a LeafNodeLM (external 
graphics, pagenumbers and citations).

The general idea of Knuth algorithm is:

  try to find breaking points without hyphenating words
  if this fails
    hyphenate all words
    try again

The "hyphenate all words" phase could be time-expansive, so this step is 
performed trying to use as much as possible the information already known, and 
to minimize the changes to the existing sequence of elements.
The old sequence is used to collect word fragments, and elements are replaced 
only if the LM which created them has something to change.
So, "hyphenate all words" means:

  scan the old sequence once:
    collect word fragments
    hyphenate word
  scan the old sequence once more:
    if the LM which returned this element has changed something
      replace all elements returned by this LM
 
These are the new methods added; at the moment, I added them to the 
LayoutManager interface, but maybe it could be better to create a new 
interface implementd only by LM returning inline areas.

+ getNextKnuthElement()
This is used instead of getNextBreakPoss().
The next step (I have already started working on it) would be to use the same 
method all LM use.

+ addALetterSpaceTo()
The low-level LMs (TLMs, LNLMs) have only a partial view of the text, and 
therefore cannot know the exact number of letter spaces, while the LineLM has 
a full view.
If a TLM's text is "Tex", it can only suppose it has 2 letter spaces; if the 
following formatting object is a character "t", the LineLM tells the TLM to 
add a letter space, as the "x" is not the last letter of the word.

+ getWordChars()
This is not a new method, it just has different parameters; text is collected 
from fo:characters too.

+ hyphenate()
The TLM does not apply the changes to vecAreaInfo immediately, otherwise the 
existing Position objects stored in the old sequence couldn't be used any 
more. The LeafNodeLM returns a single area, so it can apply changes immediatly.

+ applyChanges()
This method tells the TLM to apply the changes to vecAreaInfo; all LM returns 
true if something is changed or false otherwise, so the LLM knows whether it 
has to replace the old elements or not.

+ getChangedKnuthElement()
This is used by the LLM to obtain the new elements.

+ getWordSpaceIPD()
This is used by the LLM to ask for the word space dimension; the LLM needs it 
to center text.

A few details to fix:

- word spacing and letter spacing are now fully implemented, they can both 
have MinOptMax values; but I am still thinking about how to differentiate a 
user-defined zero value from a default zero value ...

- Leaders with leader-pattern = "rule" or "space" work well; with "dots" the 
space left is right, but the dots don't fill it properly. Leaders with leader-
pattern="use-content" don't work, as the ContentLayoutManager has at the 
moment only a null implementation of the method getNextKnuthElement.
There is also a minor bug concerning (IMO) white space handling: if there 
white space both before and after the leader, the latter one is removed, so 
instead of
  word ______ word
the output shows
  word ______word

- with the other fo elements (fo:externalgraphic, fo:page-number and fo:page-
number-citation) the LeafNodeLM behave exactly the same way as with the old 
code, i.e. a fo:page-number-citation generates a "?  ".

- text-align-last is partially implemented; text-align-last = "justify" works 
only if text-align = "justify" too; this is because Knuth's algorithm doesn't 
provide for a different alignment for the last line.

I'm going to attach:
- the patch to existing files and new files
- a test fo file

Comment 14 Luca Furini 2004-08-03 12:45:04 UTC

Created attachment 12308 [details]
patch (second edition)

Comment 15 Luca Furini 2004-08-03 12:45:31 UTC

Created attachment 12309 [details]
test fo file

Comment 16 Luca Furini 2004-08-03 13:15:58 UTC

I just realized that in the patch there is a line commented out, while it 
shouldn't be. 

It's in the LineLayoutManager, in the method getNextBreakPoss();
the code at the moment is:

  ...
  if (true) {
  //if ((iBPcount = findBreakingPoints(currPar, context.getStackLimit().opt, 
maxAdjustment)) == 0) {
  ...

The commented line should be uncommented, and the "if (true) {" removed.
Anyway, the program works all the same.

Sorry for the oversight!
    Luca

Comment 17 Glen Mazza 2004-08-03 13:32:14 UTC

Thanks, you will need to add the comments in this patch to the *code*, people 
will not always have the benefit of this Bugzilla entry when looking at it.  
This is extremely important, as layout is very complex.

Also, is this Knuth algorithm copyrighted?  Where did you get it from?  It is 
rare that we have classes named after authors.

Glen

Comment 18 Glen Mazza 2004-08-03 14:42:55 UTC

Thanks again for you work--we have very few who can work on layout.

Apparently, Dr. Knuth wouldn't seem to mind using his algorithms:
http://lpf.ai.mit.edu/Patents/knuth-to-pto.txt

Comment 19 Clay Leeds 2004-08-03 15:00:20 UTC

Luca, I also thank you for your time and commitment. Grazie mille^H^H^H^H^H un miliardo! I'm sure it 
is very much appreciated by the other COMMITTERs, as well as the throngs who will benefit from your 
time and energy in the future. This is a very exciting addition to FOP, and I'm hoping it will help to 
simplify the code in other ways as well. It's really nice to have a multitude of people who 'capish' (grok) 
the inner workings of FOP.

Web Maestro Clay

Comment 20 Luca Furini 2004-08-05 16:09:37 UTC

I'm going to attach the corrected patch, sorry again!

Knuth's algorithm is described in the essay
  D. E. Knuth and M. F. Plass, "Breaking paragraphs into lines"
and I found it in the book
  D. E. Knuth, "Digital typography", published by CSLI Publications
Unfortunately, I couldn't find any link to an on-line version of this essay.

As regards the names of the classes, they were mainly devised to detect 
quickly the new files among the others! So, it's not a problem for me to 
change them :-)

    Luca

Comment 21 Luca Furini 2004-08-05 16:11:08 UTC

Created attachment 12345 [details]
patch file (third edition, minor oversight fixed)

Comment 22 Simon Pepping 2004-08-10 19:28:14 UTC

Luca,

Your patch applied cleanly to a checkout from CVS dated 1 August 2004,
but I get build errors. Could it be that differences to HyphContext
are not included in the patch?

    [javac] src/java/org/apache/fop/layoutmgr/LineLayoutManager.java:1081:
cannot resolve symbol
    [javac] symbol  : constructor HyphContext (int[],int)
    [javac] location: class org.apache.fop.layoutmgr.HyphContext
    [javac]             return new HyphContext(hyph.getHyphenationPoints(),
sbChars.length());
    [javac]                    ^
    [javac] src/java/org/apache/fop/layoutmgr/TextLayoutManager.java:425: cannot
resolve symbol
    [javac] symbol  : method isWordEnd ()
    [javac] location: class org.apache.fop.layoutmgr.HyphContext
    [javac]             newIPD.add(MinOptMax.multiply(letterSpaceIPD,
(hc.isWordEnd()?
    [javac]                                                              ^
    [javac] src/java/org/apache/fop/layoutmgr/TextLayoutManager.java:437: cannot
resolve symbol
    [javac] symbol  : method isWordEnd ()
    [javac] location: class org.apache.fop.layoutmgr.HyphContext
    [javac]                                                              
(short)0, (short)(hc.isWordEnd()? (iStopIndex - iStartIndex - 1): (iStopIndex -
iStartIndex)),
    [javac]                                                                    
              ^
    [javac] 3 errors

Simon

Comment 23 Luca Furini 2004-08-12 08:23:13 UTC

I'm going to attach an updated patch, including HyphContext (which I forgot to 
include in the previous versions, sorry) and a few changes to fix a couple of 
bugs.

I used linux's diff between the modified files and the original ones (updated 
yesterday, 11 August); for some reasons (maybe I use some wrong options) 
wincvs's diff did not include new files and did not use the latest version of 
the original files, so finding lots of difference due to recent cvs commits.

I hope I did not forget anything this time! :-)

    Luca

Comment 24 Luca Furini 2004-08-12 08:25:01 UTC

Created attachment 12400 [details]
patch file (fourth edition, including HyphContext and bug fixes)

Comment 25 Simon Pepping 2004-08-16 19:13:37 UTC

Hi Luca,

I have problems with the results for text-align="start", "end" and
"center" in your test FO file. The lines are too long, and the first
line ends with a space. For start and end a paragraph is
dropped. Herewith the area tree output for start/end and center:

                  <block width="288000" ipd="288000" height="30800"
props="border-start:(87,#ffff00,1000);break-after:8;border-end:(87,#ffff00,1000);border-after:(87,#ffff00,1000);border-before:(87,#ffff00,1000);break-before:8;">
                    <lineArea height="14400">
                      <text twsadjust="0" tlsadjust="0"
props="font-size:12000;font-family:F1;color:#000000;">Poche corte parole molto
corte, in modo che tutto vada bene e non ci siano guai. Qualche </text>
                    </lineArea>
                    <lineArea height="14400">
                      <text twsadjust="0" tlsadjust="0"
props="font-size:12000;font-family:F1;color:#000000;">tra parola per fare tre
righe.</text>
                    </lineArea>
                  </block>

                  <block width="288000" ipd="288000" height="30800"
props="border-start:(87,#0000ff,1000);break-after:8;border-end:(87,#0000ff,1000);border-after:(87,#0000ff,1000);border-before:(87,#0000ff,1000);break-before:8;">
                    <lineArea height="14400">
                      <text twsadjust="0" tlsadjust="0"
props="font-size:12000;font-family:F1;color:#000000;">Poche corte parole molto
corte, in modo che tutto vada bene e non </text>
                    </lineArea>
                    <lineArea height="14400">
                      <text twsadjust="0" tlsadjust="0"
props="font-size:12000;font-family:F1;color:#000000;">ci siano guai. Qualche
altra paro</text>
                      <text twsadjust="0" tlsadjust="0"
props="font-size:12000;font-family:F1;color:#000000;">la per fare tre righe.</text>
                    </lineArea>
                  </block>

Regards, Simon

Comment 26 Luca Furini 2004-08-17 13:51:09 UTC

Oops, sorry again, Simon!

In the code I used when creating the last patch there was an error affecting 
the TextLayoutManager.getChangedKnuthElements() method: two missing "break" 
inside a switch.
Due to this error, the sequence of elements generated for each space (when 
text-align is center, start or end) is wrong, and some text "disappears" (I 
even got IndexOutOfBounds exceptions).
Inserting these breaks is enough to make everything work:

    ...
    // ai refers to a space
    switch (alignment) {
        case CENTER : ...
                      iReturnedIndex ++;
                      break;             /* this was missing */
        case START  : // fall through
        case END    : ...
                      iReturnedIndex ++;
                      break;             /* this was missing */
        case JUSTIFY: ...
    }
    ...

As you can see, in the last patch I changed the getNextKnuthElements() and 
getChangedKnuthElements() return type, so they now return a sequence of 
elements instead of a single one.
This maybe reduces similarities between getNextKnuthElements() and 
getNextBreakPoss(), but I think it makes the code simpler and easier to 
understand. Maybe it would be even better to make them return the whole 
sequence, so that these methods are called once per LM.

Now I'm working on the newly-created LMs, so next patch (which I think will be 
ready tomorrow) will apply to the latest code version and will include Finn 
Bock's changes.

Regards,
    Luca

Comment 27 Luca Furini 2004-08-19 16:33:56 UTC

Hi again

I have updated the patch, so it now includes the latest changes and the new 
LayoutManagers (CharacterLM and LeaderLM).

I also fixed the text-align-last property, so it now works well even if  text-
align is not "justify".

Regards,
    Luca

Comment 28 Luca Furini 2004-08-19 16:36:01 UTC

Created attachment 12488 [details]
patch - existing files (fifth edition)

Comment 29 Luca Furini 2004-08-19 16:36:46 UTC

Created attachment 12489 [details]
patch - new files (fifth edition)

Comment 30 Simon Pepping 2004-08-19 19:28:38 UTC

The last patches do not apply without errors:
- Last lines of patch files do not end with an newline.
- Diff of area/inline/TextArea.java is incomplete.
- Compile error in render/xml/XMLRenderer.java.
I did manage to fix all problems.

Comment 31 Simon Pepping 2004-08-19 19:30:51 UTC

Luca,

This is a good piece of work again. Nevertheless, I did succeed in
breaking the code. See the following test fo files.

Nested inline and other LMs: The output contains errors, see the
comments in the text. The errors occur when hyphenation is set to
true.

Justification: This is a test fo you submitted earlier. According to
the text in the file the second block should be hyphenated; it is
not. Should it still be hyphenated, or can this not be enforced with
the Knuth algorithm and text-align="start"?

No breakpoints: An exception is thrown, at
LineLayoutManager.getNextBreakPoss(LineLayoutManager.java:495). It
occurs because breakpoints has size 0; the third call to
findBreakingPoints also returned 0. This should not be possible; the
algorithm should always return a breakpoint.

A few small remarks:

Can you move the following log messages to trace log level:
[DEBUG] AbstractLayoutManager - - Word to hyphenate: We

In TextLM, returning null for a forced LF is not an idea that I like,
because it overloads the null return value. Cannot you return an
special Knuth element for LF? Alternatively, you could return null and
process the paragraph. The second paragraph would then be produced and
processed later.

InlineStackingLM.getNextKnuthElements: 'if (lc.startsNewArea())' no
longer used?

Regards, Simon

Comment 32 Simon Pepping 2004-08-19 19:32:11 UTC

Created attachment 12492 [details]
test fo: Nested inline and other LMs

Comment 33 Simon Pepping 2004-08-19 19:32:58 UTC

Created attachment 12493 [details]
test fo: Justification

Comment 34 Simon Pepping 2004-08-19 19:33:45 UTC

Created attachment 12494 [details]
test fo: No breakpoints

Comment 35 Luca Furini 2004-08-24 15:24:26 UTC

I'm going to attach still another version of the patch :-), corrected 
according to Simon's comments.

The new files (Knuth*.java) are unchanged, so I don't attach them.

Regards, 
    Luca

Comment 36 Luca Furini 2004-08-24 15:25:53 UTC

Created attachment 12518 [details]
patch to existing files (sixth edition)

Comment 37 Simon Pepping 2004-08-27 19:29:36 UTC

Luca,

The build failed. That is probably due to this line in the patch:

cvs server: I know nothing about layoutmgr/LeafNodeLaoyutManager.java

Simon

Comment 38 Luca Furini 2004-08-28 10:06:48 UTC

Sorry again, I didn't notice that cvs error.

I'm going to attach the right (?) patch :-)
(for some strange reasons wincvs' diff shows Simon's latest changes too, I 
hope this is not a problem ...)

    Luca

Comment 39 Luca Furini 2004-08-28 10:09:09 UTC

Created attachment 12560 [details]
patch to existing files (version 7.1)

Comment 40 Luca Furini 2004-08-30 10:27:22 UTC

I applied the latest cvs commit and re-created the patch using linux's diff.
Then I succeeded in applying the patch, compiling and running a few fo tests, 
so I'm (almost) sure this time it will work! :-)

Sorry again for the long delay
    Luca

Comment 41 Luca Furini 2004-08-30 10:29:02 UTC

Created attachment 12566 [details]
patch to existing files (tested)

Comment 42 Simon Pepping 2004-08-31 18:44:14 UTC

Luca,

Thanks for the new patch. I could apply it without problems, and
testing it goes well.

You mention that you have not implemented the Knuth algorithm for
ContentLM. Would it be difficult to do that?

FOP team,

If I would apply this patch, we would get the following regressions:

- ContentLM does not show its content. A leader with
  leader-pattern="use-content" results in a blank area of the right
  size.

- When for an exceptionally difficult paragraph no set of breaking
  points can be found, the whole paragraph is printed on a single
  line. This occurs, for example, when in a narrow typesetting width
  only a single word or a part of it fits in a line.

I am working towards applying this patch despite these
regressions, for these reasons:

- This patch is a good piece of work, and a step forward for FOP's
  layout.

- It becomes increasingly hard to maintain this patch outside of CVS.

Please, speak up if you are against this.

Simon

Comment 43 Glen Mazza 2004-09-02 04:23:52 UTC

<Q>
I am working towards applying this patch despite these
regressions, for these reasons:

- This patch is a good piece of work, and a step forward for FOP's
  layout.

- It becomes increasingly hard to maintain this patch outside of CVS.

Please, speak up if you are against this.
</Q>

Simon, I have not had the time to be following this issue much so will be
deferring to your judgment.

Thanks,
Glen

Comment 44 Simon Pepping 2004-09-05 18:21:07 UTC

Luca,

Patch applied. Thanks for this innovative and extensive contribution.

Still to be done:

- Resolve the regressions mentioned above.
- I support the idea to create an InlineLayoutManager interface, which
  extends LayoutManager.
- This patch has made a lot of existing code redundant. Much of that
  code is still present. To keep the code clean and intelligible, the
  redundant pieces should be removed at some time by somebody.

I have added a space after casts. See the style guidelines in the file
dev/conventions.html of the FOP web site.

I have a few remarks about the code. I leave it to you to follow these
up or not, but I would like to see point 1 addressed:

1. In TextLM:
// linefeed; this can happen when linefeed-treatment="preserve"
// the linefeed character is the first one in textArray,
// so we can just return a list with a penalty item

In LineLM:
if (returnedList.size() > 1
    || !(thisElement.isPenalty() && ((KnuthPenalty)thisElement).getP() ==
-KnuthElement.INFINITE)) {
} else {
    // a list with a single penalty item whose value is -inf
    // represents a preserved linefeed, wich forces a line break

Can we be sure that U+A is always alone or the first item in a
textArray; does this not depend on the Parser, how it calls the SAX
characters method?

2. In InlineStackingLM.applyChanges: Falling over the end of
oldListIterator can be done better: treat only currLM != prevLM in the
loop, treat !oldListIterator.hasNext() after the loop. Same for
getChangedKnuthElements? :

while(oldListIterator.hasNext()) {
    oldElement = (KnuthElement)oldListIterator.next();
    currLM = oldElement.getLayoutManager();
    // initialize prevLM
    if (prevLM == null) {
        prevLM = currLM;
    }

    if (currLM != prevLM) {
        bSomethingChanged = prevLM.applyChanges(oldList.subList(fromIndex,
oldListIterator.previousIndex()))
                            || bSomethingChanged;
        prevLM = currLM;
        fromIndex = oldListIterator.previousIndex();
    }
}
bSomethingChanged = currLM.applyChanges(oldList.subList(fromIndex, oldList.size()))
                    || bSomethingChanged;

Possible cases, after the loop:
xxyy or yy, xx done
prevLM = currLM = y
fromIndex = last done (2 and 0)

3. In InlineStackingLM: Unnecessary differences between treatment of
returnedList and returnList in getNextKnuthElements and
getChangedKnuthElements. In getChangedKnuthElements it is not
necessary to have a separate returnedList and returnList.

4. Break up long methods in LineLM: findHyphenationPoints,
getNextBreakPoss, considerLegalBreak (?), findBreakingPoints (?).

Regards, Simon

Comment 45 Glenn Adams 2012-04-01 06:27:15 UTC

batch transition pre-FOP1.0 resolved+fixed bugs to closed+fixed