Bug 49801 - [PATCH] Region-Body Column balancing incorrect if content is table with header
Summary: [PATCH] Region-Body Column balancing incorrect if content is table with header
Status: NEW
Alias: None
Product: Fop - Now in Jira
Classification: Unclassified
Component: page-master/layout (show other bugs)
Version: 1.0
Hardware: PC All
: P5 enhancement
Target Milestone: ---
Assignee: fop-dev
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-23 05:56 UTC by a.kovacs
Modified: 2012-11-30 11:11 UTC (History)
2 users (show)



Attachments
test file (3.80 KB, text/plain)
2010-08-23 06:05 UTC, a.kovacs
Details
altered test case (4.41 KB, text/plain)
2011-02-04 13:18 UTC, Andreas L. Delmelle
Details
Dirty hack (1.37 KB, patch)
2012-04-23 22:51 UTC, Alex Giotis
Details | Diff
Current output PDF (5.30 KB, application/pdf)
2012-04-23 22:53 UTC, Alex Giotis
Details
Expected output PDF (after patch) (5.30 KB, application/pdf)
2012-04-23 22:54 UTC, Alex Giotis
Details
New column balancing algorithm implementation (47.98 KB, patch)
2012-11-29 10:02 UTC, Robert Meyer
Details | Diff
Sample list of test column balancing fo files (16.96 KB, application/x-zip-compressed)
2012-11-29 10:05 UTC, Robert Meyer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description a.kovacs 2010-08-23 05:56:02 UTC
To reproduce bug please do the following:
Use:
<fo:region-body region-name="PageBody" column-count="2" />

Fill the region-body with content like :
<fo:block span="none" > ...(content is table with header) ..
<fo:block span="all"> ... (one line (summary)) ..

If the content is made of normal blocks the columns are balanced before the span="all" summary line.
like:
123456    456789
234567    567890
345678    678901
Summary: 1234567890

If the content is a table without headers the columns are balanced correct. 
like:
123456    456789
234567    567890
345678    678901
Summary: 1234567890

If the content is a table with header the columns are not balanced correct. (the right one is shorter.)
Header    Header
123456    567890
234567    678901
345678    
456789
Summary: 1234567890

The "computeDemerits()" algorithm is wrong in class "BalancingColumnBreakingAlgorithm".
The "fullLen" value is to short. Exactly the replicated header width is missing. In the "par" list the header is contained only once although the header is displayed in every column. (in the example twice)

Solution could be to place the header as many times in the "par" list as many columns exist, or to count the existing one header as many times as needed.
Comment 1 a.kovacs 2010-08-23 06:05:19 UTC
Created attachment 25928 [details]
test file
Comment 2 Jeremias Maerki 2010-08-30 09:52:53 UTC
The algorithm does indeed not take the penalty width at the break point into account. I've experimented for about an hour but still haven't found a solution. That class is a hack by itself right now. I have never found a really good algorithm for calculating the demerits for column balancing. I guess this needs another serious review.
Comment 3 Jeremias Maerki 2010-09-06 09:46:13 UTC
I've been asked to look closer into this, and after about 10 hours spent brooding over it and consulting the "Digital Typography" book I'm no closer to solving it. I've been able to calculate the demerits which I expected to result in the right break decision (for one case), but the algorithm still chose another breaking point for reasons I just don't understand. The BalancingColumnBreakingAlgorithm class was some kind of hack to begin with which I knew had some short-comings. That is now pretty obvious, and I don't know how to proceed. I have to give up for now since this is simply a bit too high for me. I'll copy my local changes away so if anyone wants to give it a shot, I can provide them. But I'm not sure if they would really help.

If anyone can help Adam, please get in touch with him.
Comment 4 Andreas L. Delmelle 2011-02-04 13:18:28 UTC
Created attachment 26606 [details]
altered test case

The fullLen in computeDemerits() is indeed off, because ElementListUtils.calcContentLength() naively sums box and glue lengths. It just counts the header box once, and does not contain any logic to take into account possible special behavior of the penalties. Adding the header box multiple times to the list will not immediately solve the issue.
Question that needs to be answered in order to do that: how many times would it need to be duplicated? Answer: the number of times the table is broken plus 1. But we don't know that number, at that point... That is precisely the reason the table-layout code only adds the box once.

See also the attached altered test case for a slightly more complicated example. The first and last table's headers only need to be counted once, while the header and footer of the second table need to be counted --how many times? The algorithm currently only breaks once.

Seems like the chicken and the egg... :-/
Comment 5 Andreas L. Delmelle 2011-02-04 13:27:07 UTC
Additionally, it's not only fullLen which is off, but obviously also partLen and restLen, for the same reason (calcContentLength()). There will be cases where the header will have to be taken into account for partLen, but not for restLen or vice versa, and cases where it will need to be counted for both.

Currently, the algorithm operates as if the header and footer only count for the part where the table happens to end.
Comment 6 Andreas L. Delmelle 2011-02-08 15:19:45 UTC
Just thought of similar potential issues, due to the limitations of ElementListUtils.calcContentLength(), but unrelated to column-balancing per se, that are to be expected in list-items and table-cells.
More precisely those that have a table with header and/or footer as a descendant, and are divided over more than one column/page.
For lists, problems may arise in ListItemLayoutManager.getCombinedKnuthElementsForListItem(), I think, where the calculated lengths could prove to be wrong (due to the content eventually being broken at least once).
For table-cells, similarly, the totalLength in the ActiveCell constructor might be too low if the cell contains a long nested table that causes multiple breaks...
Comment 7 Andreas L. Delmelle 2011-03-21 15:27:29 UTC
Thinking more about a possible fix, I am suddenly asking myself why we would even fiddle with the demerits at all to balance columns. Playing with the demerits alone is not enough to make a break more/less favorable. Ultimately, the adjustment ratio is a deciding factor too, and that is based on the difference between available width and the width of the line.

Since the algorithm works with the full bpd of the page as line-width, this is bound to cause the algorithm to 'ignore' the better demerits at some points.

In a way, that feels like we could handle column-balancing better by (also) forcing the available line-width down to the minimal width needed by the content (i.e. taking into account pieces that must be kept together within the same column).
Comment 8 Glenn Adams 2012-04-07 01:37:32 UTC
resetting severity from major to normal pending further review
Comment 9 Glenn Adams 2012-04-07 01:42:15 UTC
resetting P2 open bugs to P3 pending further review
Comment 10 Glenn Adams 2012-04-21 05:33:07 UTC
lowering priority and changing to enhancement since (1) a fix is likely to require a fairly significant design change and (2) column balancing is a (potential) XSL-FO 2.0 feature [1], not an XSL-FO 1.X feature;

[1] http://www.w3.org/TR/xslfo20-req/#N66487
Comment 11 Alex Giotis 2012-04-23 22:49:00 UTC
Even though this is not explicitly specified in XSL-FO 1.X, columns are expected to be balanced. For our documents, unbalanced columns, are not an acceptable output. Unfortunately, we were not able to find a solution. I can only attach a small patch of a dirty hack that gives the desired result for the initially attached test file, has been working well for us for the last 2 years and (of course) passes all the unit tests.
Comment 12 Alex Giotis 2012-04-23 22:51:58 UTC
Created attachment 28662 [details]
Dirty hack
Comment 13 Alex Giotis 2012-04-23 22:53:48 UTC
Created attachment 28663 [details]
Current output PDF
Comment 14 Alex Giotis 2012-04-23 22:54:46 UTC
Created attachment 28664 [details]
Expected output PDF (after patch)
Comment 15 Alex Giotis 2012-04-23 23:00:30 UTC
The patch is against revision 1329104 of trunk.
Comment 16 Robert Meyer 2012-11-29 10:02:39 UTC
Created attachment 29653 [details]
New column balancing algorithm implementation

This patch is a re-write of the coloumn balancing algorithm to fix the issues that existed within the old code. We have been working on this for the last couple of weeks and has been a team effort by me, Luis Bernado and Vincent Hennebert.

It passes all unit tests and renders not only the tests above correctly but also many others we have thrown at it. I will post another zip attachment containing sample balancing fo's for you to try should you wish.
Comment 17 Robert Meyer 2012-11-29 10:05:55 UTC
Created attachment 29655 [details]
Sample list of test column balancing fo files

A zip containing a list of sample column balancing FO files
Comment 18 Alex Giotis 2012-11-30 11:11:44 UTC
Great work !!

I tested it with a multi-page complex layout and it is working very well. Luckily for us, I see the same output as with the hack I had attached. Next week, a colleague will execute more tests on other documents with complex layout and will check some corner cases. I will add a comment if something comes up.