To reproduce bug please do the following: Use: <fo:region-body region-name="PageBody" column-count="2" /> Fill the region-body with content like : <fo:block span="none" > ...(content is table with header) .. <fo:block span="all"> ... (one line (summary)) .. If the content is made of normal blocks the columns are balanced before the span="all" summary line. like: 123456 456789 234567 567890 345678 678901 Summary: 1234567890 If the content is a table without headers the columns are balanced correct. like: 123456 456789 234567 567890 345678 678901 Summary: 1234567890 If the content is a table with header the columns are not balanced correct. (the right one is shorter.) Header Header 123456 567890 234567 678901 345678 456789 Summary: 1234567890 The "computeDemerits()" algorithm is wrong in class "BalancingColumnBreakingAlgorithm". The "fullLen" value is to short. Exactly the replicated header width is missing. In the "par" list the header is contained only once although the header is displayed in every column. (in the example twice) Solution could be to place the header as many times in the "par" list as many columns exist, or to count the existing one header as many times as needed.
Created attachment 25928 [details] test file
The algorithm does indeed not take the penalty width at the break point into account. I've experimented for about an hour but still haven't found a solution. That class is a hack by itself right now. I have never found a really good algorithm for calculating the demerits for column balancing. I guess this needs another serious review.
I've been asked to look closer into this, and after about 10 hours spent brooding over it and consulting the "Digital Typography" book I'm no closer to solving it. I've been able to calculate the demerits which I expected to result in the right break decision (for one case), but the algorithm still chose another breaking point for reasons I just don't understand. The BalancingColumnBreakingAlgorithm class was some kind of hack to begin with which I knew had some short-comings. That is now pretty obvious, and I don't know how to proceed. I have to give up for now since this is simply a bit too high for me. I'll copy my local changes away so if anyone wants to give it a shot, I can provide them. But I'm not sure if they would really help. If anyone can help Adam, please get in touch with him.
Created attachment 26606 [details] altered test case The fullLen in computeDemerits() is indeed off, because ElementListUtils.calcContentLength() naively sums box and glue lengths. It just counts the header box once, and does not contain any logic to take into account possible special behavior of the penalties. Adding the header box multiple times to the list will not immediately solve the issue. Question that needs to be answered in order to do that: how many times would it need to be duplicated? Answer: the number of times the table is broken plus 1. But we don't know that number, at that point... That is precisely the reason the table-layout code only adds the box once. See also the attached altered test case for a slightly more complicated example. The first and last table's headers only need to be counted once, while the header and footer of the second table need to be counted --how many times? The algorithm currently only breaks once. Seems like the chicken and the egg... :-/
Additionally, it's not only fullLen which is off, but obviously also partLen and restLen, for the same reason (calcContentLength()). There will be cases where the header will have to be taken into account for partLen, but not for restLen or vice versa, and cases where it will need to be counted for both. Currently, the algorithm operates as if the header and footer only count for the part where the table happens to end.
Just thought of similar potential issues, due to the limitations of ElementListUtils.calcContentLength(), but unrelated to column-balancing per se, that are to be expected in list-items and table-cells. More precisely those that have a table with header and/or footer as a descendant, and are divided over more than one column/page. For lists, problems may arise in ListItemLayoutManager.getCombinedKnuthElementsForListItem(), I think, where the calculated lengths could prove to be wrong (due to the content eventually being broken at least once). For table-cells, similarly, the totalLength in the ActiveCell constructor might be too low if the cell contains a long nested table that causes multiple breaks...
Thinking more about a possible fix, I am suddenly asking myself why we would even fiddle with the demerits at all to balance columns. Playing with the demerits alone is not enough to make a break more/less favorable. Ultimately, the adjustment ratio is a deciding factor too, and that is based on the difference between available width and the width of the line. Since the algorithm works with the full bpd of the page as line-width, this is bound to cause the algorithm to 'ignore' the better demerits at some points. In a way, that feels like we could handle column-balancing better by (also) forcing the available line-width down to the minimal width needed by the content (i.e. taking into account pieces that must be kept together within the same column).
resetting severity from major to normal pending further review
resetting P2 open bugs to P3 pending further review
lowering priority and changing to enhancement since (1) a fix is likely to require a fairly significant design change and (2) column balancing is a (potential) XSL-FO 2.0 feature [1], not an XSL-FO 1.X feature; [1] http://www.w3.org/TR/xslfo20-req/#N66487
Even though this is not explicitly specified in XSL-FO 1.X, columns are expected to be balanced. For our documents, unbalanced columns, are not an acceptable output. Unfortunately, we were not able to find a solution. I can only attach a small patch of a dirty hack that gives the desired result for the initially attached test file, has been working well for us for the last 2 years and (of course) passes all the unit tests.
Created attachment 28662 [details] Dirty hack
Created attachment 28663 [details] Current output PDF
Created attachment 28664 [details] Expected output PDF (after patch)
The patch is against revision 1329104 of trunk.
Created attachment 29653 [details] New column balancing algorithm implementation This patch is a re-write of the coloumn balancing algorithm to fix the issues that existed within the old code. We have been working on this for the last couple of weeks and has been a team effort by me, Luis Bernado and Vincent Hennebert. It passes all unit tests and renders not only the tests above correctly but also many others we have thrown at it. I will post another zip attachment containing sample balancing fo's for you to try should you wish.
Created attachment 29655 [details] Sample list of test column balancing fo files A zip containing a list of sample column balancing FO files
Great work !! I tested it with a multi-page complex layout and it is working very well. Luckily for us, I see the same output as with the hack I had attached. Next week, a colleague will execute more tests on other documents with complex layout and will check some corner cases. I will add a comment if something comes up.