Bug 46828 - [PATCH] Enabling area tree caching from FOUserAgent
Summary: [PATCH] Enabling area tree caching from FOUserAgent
Status: CLOSED FIXED
Alias: None
Product: Fop - Now in Jira
Classification: Unclassified
Component: general (show other bugs)
Version: trunk
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: fop-dev
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-03-10 07:24 UTC by Laera Dario
Modified: 2012-04-01 07:05 UTC (History)
0 users



Attachments
Bug fixes and FOUserAgent facility (7.37 KB, patch)
2009-03-10 07:26 UTC, Laera Dario
Details | Diff
Revised patch proposal (12.61 KB, text/plain)
2009-03-12 12:52 UTC, Andreas L. Delmelle
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Laera Dario 2009-03-10 07:24:43 UTC
The attached patch fix some bugs in area tree caching and allow to enable caching from FOUserAgent. We use it in production without problems and would be nice to see it included in the next stable release.
I know that few users would benefit from this patch, but at the same time it seems harmless to me.

Patch details:
* CachedRenderPagesModel.java: page.toString() was producing a bad file name, in particular all objects were written in the same .ser file and Fop wasn't able to read back those objects when there was too much pages
* InlineArea.java: need to be serializable
* AbstractLayoutManager.java, TableCellLayoutManager.java: just free references to the objects serialized
Comment 1 Laera Dario 2009-03-10 07:26:17 UTC
Created attachment 23366 [details]
Bug fixes and FOUserAgent facility
Comment 2 Andreas L. Delmelle 2009-03-10 14:53:00 UTC
Thanks for the patch. I'm going to apply it, with just one change.

Not sure if anyone else agrees, but the modification to AbstractLayoutManager seems to belong in BlockStackingLayoutManager...? 
It seems cleaner to override notifyEndOfLayout() in BlockStackingLM, and depend on the runtime type-checking to trigger the correct implementation, instead of forcing the explicit instanceof check on all concrete subclasses.

In BlockStackingLM:

    /** {@inheritDoc} */
    protected void notifyEndOfLayout() {
        super.notifyEndOfLayout();
        // Free memory of the area tree
        this.parentArea = null;
    }

Should have the exact same effect, but the change is then made only to the affected class source.

(I'm not too happy with the explicit references to FlowLM and PageSequenceLM a bit further down --I put them there myself, though :-S At any rate, references to concrete subclasses in an abstract superclass just always smell a bit strange...)

BTW: if using caching is beneficial, we may want to consider defaulting to 'true'. Opinions?
Comment 3 Jeremias Maerki 2009-03-11 00:19:35 UTC
Thanks for the patch, Dario. I agree with Andreas that the code in AbstractLayoutManager should really be in BlockStackingLayoutManager. I can faintly remember that I've once investigated the right time to release memory in a layout manager. I think that was for page-number-citation-last. I'm not sure that we already always know exactly when a layout manager is really finished. Anyway, we've got a good test suite so any bugs in that area should quickly be found.

@Andreas: what does "beneficial" mean? Beneficial in terms of memory consumption or of speed? I think that can be mutually exclusive. Before we decide to enable this by default I would want to have some numbers (performance & memory usage) to base the decision upon.

Another comment to the patch: I'm not particularly happy about calling the temporary area tree unloading to disk simply "the cache". We have a font cache, an image cache and others. I'm not even sure if this is really a classic "cache" as such. We need a better name for this IMO. How about FOUserAgent.setTemporaryAreaTreeSerializationEnabled(boolean)? Or FOUserAgent.setDiskBasedRenderPagesModel(boolean)? A bit lengthy but at least more speaking. Or FOUserAgent.setConserveMemoryPolicy(boolean) which would be a less specific approach that other code parts also could use as hint?
Comment 4 Laera Dario 2009-03-11 02:13:35 UTC
(In reply to comment #3)
> Thanks for the patch, Dario. I agree with Andreas that the code in
> AbstractLayoutManager should really be in BlockStackingLayoutManager. I can
> faintly remember that I've once investigated the right time to release memory
> in a layout manager. I think that was for page-number-citation-last. I'm not
> sure that we already always know exactly when a layout manager is really
> finished. Anyway, we've got a good test suite so any bugs in that area should
> quickly be found.

I was working on this patch in October, I remember that the automated test on the area generated was failing because the check was based on the freed area... Anyway I've never encountered problems in my typical usage, fop just prints this warning when updating the area:

11-mar-2009 9.24.58 org.apache.fop.area.inline.UnresolvedPageNumber resolveIDRef
AVVERTENZA: Cannot update the IPD of an unresolved page number. No font information available.

But the final result is fine.

> 
> @Andreas: what does "beneficial" mean? Beneficial in terms of memory
> consumption or of speed? I think that can be mutually exclusive. Before we
> decide to enable this by default I would want to have some numbers (performance
> & memory usage) to base the decision upon.

I think that area caching shouldn't be the default option: it provides a significant memory improvement for some cases (large documents with many page-seq plus page citation) but in most cases it would just slowdown the process (disk access for read/write) without any benefit.
It is useful in my scenario where I produce 50000 pages document, that means 400-500 MB of RAM and 4 GB of disk cache (that otherwise should reside in memory...).
For completeness, if I just want the total page number I can achieve better memory results by using the hack of the two pass indicated in http://xmlgraphics.apache.org/fop/fo.html#fo-total-pages, but it is slower. Theoretically it should be twice slower, but in practice the update process plus the disk access reduces the time gap.
Anyway I think that the two pass method can't work if you have many cross reference between page-seqs and not only the total page number.


> 
> Another comment to the patch: I'm not particularly happy about calling the
> temporary area tree unloading to disk simply "the cache". We have a font cache,
> an image cache and others. I'm not even sure if this is really a classic
> "cache" as such. We need a better name for this IMO. How about
> FOUserAgent.setTemporaryAreaTreeSerializationEnabled(boolean)? Or
> FOUserAgent.setDiskBasedRenderPagesModel(boolean)? A bit lengthy but at least
> more speaking. Or FOUserAgent.setConserveMemoryPolicy(boolean) which would be a
> less specific approach that other code parts also could use as hint?

I agree, "cache" is inappropriate, "setConserveMemoryPolicy" is nice and blink the eye to non fop-dev that just want to embed fop.
Comment 5 Andreas L. Delmelle 2009-03-11 10:46:08 UTC
Having read the feedback, I agree with initially setting it to false. I completely overlooked the fact that we get increased disk I/O as a result.

With respect to LM-cleanup (maybe interesting to know): 
I did make some changes (months ago) which release the FO subtree when an LM has finished adding its areas. It does not work yet in all cases, only for straightforward blocks and inlines. Most notably: it does not work for table-elements, and there are some cases with nested blocks and border-after or space-after where it doesn't work either.
Reason is that the approach relies on the isLast() method. Sometimes it returns false for the parent BlockLM, because the checked Position actually belongs to the childLM, and the real last position for the parent LM is not available in the PositionIterator that is passed to addAreas().
Comment 6 Andreas L. Delmelle 2009-03-12 12:52:32 UTC
Created attachment 23375 [details]
Revised patch proposal

OK, I'm about ready to commit. 
One thing I'm slightly unsure of: current FOP Trunk has the call to notifyEndOfLayout() for the TableCellLM at the end of the getNextKnuthElements() method. For virtually all other LMs, this call is triggered at the end of addAreas(). Does anyone know if moving that call to addAreas() could cause trouble? It breaks no tests, but I'm wondering if it was intentional or something that was forgotten and nobody ever noticed.

Apart from that, is everyone OK with using:
member: conserveMemoryPolicy
getter: isConserveMemoryPolicyEnabled()
setter: setConserveMemoryPolicy(boolean)

-> via the command-line: -conserve? or something more cryptic, like -cmp? ;-)
Comment 7 Andreas L. Delmelle 2009-03-12 12:54:51 UTC
Just noticed: the patch also contains some minor cleanup in BlockStackingLM (removals of redundant casts and simplification of conditionals). Hope this raises no objection...
Comment 8 Jeremias Maerki 2009-03-13 02:26:28 UTC
(In reply to comment #6)
> Created an attachment (id=23375) [details]
> Revised patch proposal
> 
> OK, I'm about ready to commit. 

Small omission: the "-cache" in CommandLineOptions needs to be deleted. Otherwise, this looks ok, but I haven't done any tests myself.

> One thing I'm slightly unsure of: current FOP Trunk has the call to
> notifyEndOfLayout() for the TableCellLM at the end of the
> getNextKnuthElements() method. For virtually all other LMs, this call is
> triggered at the end of addAreas(). Does anyone know if moving that call to
> addAreas() could cause trouble? It breaks no tests, but I'm wondering if it was
> intentional or something that was forgotten and nobody ever noticed.

notifyEndOfLayout() in getNextKnuthElements() would be the mistake, not the move. If the tests run through ok, you should be fine. Otherwise, we are missing a test. ;-)

> Apart from that, is everyone OK with using:
> member: conserveMemoryPolicy
> getter: isConserveMemoryPolicyEnabled()
> setter: setConserveMemoryPolicy(boolean)

/me nods

> -> via the command-line: -conserve? or something more cryptic, like -cmp? ;-)

-conserve is fine IMO.
Comment 9 Laera Dario 2009-03-13 03:19:28 UTC
(In reply to comment #8)
> > One thing I'm slightly unsure of: current FOP Trunk has the call to
> > notifyEndOfLayout() for the TableCellLM at the end of the
> > getNextKnuthElements() method. For virtually all other LMs, this call is
> > triggered at the end of addAreas(). Does anyone know if moving that call to
> > addAreas() could cause trouble? It breaks no tests, but I'm wondering if it was
> > intentional or something that was forgotten and nobody ever noticed.
> 
> notifyEndOfLayout() in getNextKnuthElements() would be the mistake, not the
> move. If the tests run through ok, you should be fine. Otherwise, we are
> missing a test. ;-)

Alas, I'm no more able to comment this kind of details, I lost some of my knowledge in few months :(

I agree about other comments
Comment 10 Andreas L. Delmelle 2009-03-13 10:53:08 UTC
Changes committed with r753327.

Thanks for submitting the patch, Daerio!
Comment 11 Glenn Adams 2012-04-01 07:05:29 UTC
batch transition pre-FOP1.0 resolved+fixed bugs to closed+fixed