Fop
  1. Fop
  2. FOP-1637

[PATCH] Enabling area tree caching from FOUserAgent

    Details

    • Type: Bug Bug
    • Status: Closed
    • Resolution: Fixed
    • Affects Version/s: trunk
    • Fix Version/s: None
    • Component/s: unqualified
    • Labels:
      None
    • Environment:
      Operating System: All
      Platform: All
    • External issue ID:
      46828

      Description

      The attached patch fix some bugs in area tree caching and allow to enable caching from FOUserAgent. We use it in production without problems and would be nice to see it included in the next stable release.
      I know that few users would benefit from this patch, but at the same time it seems harmless to me.

      Patch details:

      • CachedRenderPagesModel.java: page.toString() was producing a bad file name, in particular all objects were written in the same .ser file and Fop wasn't able to read back those objects when there was too much pages
      • InlineArea.java: need to be serializable
      • AbstractLayoutManager.java, TableCellLayoutManager.java: just free references to the objects serialized
      1. Bugzilla46828.patch
        13 kB
        Andreas L. Delmelle
      2. freeAreaTree4.diff
        7 kB
        Laera Dario

        Activity

        Hide
        Laera Dario added a comment -

        Attachment freeAreaTree4.diff has been added with description: Bug fixes and FOUserAgent facility

        Show
        Laera Dario added a comment - Attachment freeAreaTree4.diff has been added with description: Bug fixes and FOUserAgent facility
        Hide
        Andreas L. Delmelle added a comment -

        Thanks for the patch. I'm going to apply it, with just one change.

        Not sure if anyone else agrees, but the modification to AbstractLayoutManager seems to belong in BlockStackingLayoutManager...?
        It seems cleaner to override notifyEndOfLayout() in BlockStackingLM, and depend on the runtime type-checking to trigger the correct implementation, instead of forcing the explicit instanceof check on all concrete subclasses.

        In BlockStackingLM:

        /**

        {@inheritDoc}

        */
        protected void notifyEndOfLayout()

        { super.notifyEndOfLayout(); // Free memory of the area tree this.parentArea = null; }

        Should have the exact same effect, but the change is then made only to the affected class source.

        (I'm not too happy with the explicit references to FlowLM and PageSequenceLM a bit further down --I put them there myself, though :-S At any rate, references to concrete subclasses in an abstract superclass just always smell a bit strange...)

        BTW: if using caching is beneficial, we may want to consider defaulting to 'true'. Opinions?

        Show
        Andreas L. Delmelle added a comment - Thanks for the patch. I'm going to apply it, with just one change. Not sure if anyone else agrees, but the modification to AbstractLayoutManager seems to belong in BlockStackingLayoutManager...? It seems cleaner to override notifyEndOfLayout() in BlockStackingLM, and depend on the runtime type-checking to trigger the correct implementation, instead of forcing the explicit instanceof check on all concrete subclasses. In BlockStackingLM: /** {@inheritDoc} */ protected void notifyEndOfLayout() { super.notifyEndOfLayout(); // Free memory of the area tree this.parentArea = null; } Should have the exact same effect, but the change is then made only to the affected class source. (I'm not too happy with the explicit references to FlowLM and PageSequenceLM a bit further down --I put them there myself, though :-S At any rate, references to concrete subclasses in an abstract superclass just always smell a bit strange...) BTW: if using caching is beneficial, we may want to consider defaulting to 'true'. Opinions?
        Hide
        Jeremias Maerki added a comment -

        Thanks for the patch, Dario. I agree with Andreas that the code in AbstractLayoutManager should really be in BlockStackingLayoutManager. I can faintly remember that I've once investigated the right time to release memory in a layout manager. I think that was for page-number-citation-last. I'm not sure that we already always know exactly when a layout manager is really finished. Anyway, we've got a good test suite so any bugs in that area should quickly be found.

        @Andreas: what does "beneficial" mean? Beneficial in terms of memory consumption or of speed? I think that can be mutually exclusive. Before we decide to enable this by default I would want to have some numbers (performance & memory usage) to base the decision upon.

        Another comment to the patch: I'm not particularly happy about calling the temporary area tree unloading to disk simply "the cache". We have a font cache, an image cache and others. I'm not even sure if this is really a classic "cache" as such. We need a better name for this IMO. How about FOUserAgent.setTemporaryAreaTreeSerializationEnabled(boolean)? Or FOUserAgent.setDiskBasedRenderPagesModel(boolean)? A bit lengthy but at least more speaking. Or FOUserAgent.setConserveMemoryPolicy(boolean) which would be a less specific approach that other code parts also could use as hint?

        Show
        Jeremias Maerki added a comment - Thanks for the patch, Dario. I agree with Andreas that the code in AbstractLayoutManager should really be in BlockStackingLayoutManager. I can faintly remember that I've once investigated the right time to release memory in a layout manager. I think that was for page-number-citation-last. I'm not sure that we already always know exactly when a layout manager is really finished. Anyway, we've got a good test suite so any bugs in that area should quickly be found. @Andreas: what does "beneficial" mean? Beneficial in terms of memory consumption or of speed? I think that can be mutually exclusive. Before we decide to enable this by default I would want to have some numbers (performance & memory usage) to base the decision upon. Another comment to the patch: I'm not particularly happy about calling the temporary area tree unloading to disk simply "the cache". We have a font cache, an image cache and others. I'm not even sure if this is really a classic "cache" as such. We need a better name for this IMO. How about FOUserAgent.setTemporaryAreaTreeSerializationEnabled(boolean)? Or FOUserAgent.setDiskBasedRenderPagesModel(boolean)? A bit lengthy but at least more speaking. Or FOUserAgent.setConserveMemoryPolicy(boolean) which would be a less specific approach that other code parts also could use as hint?
        Hide
        Laera Dario added a comment -

        (In reply to comment #3)
        > Thanks for the patch, Dario. I agree with Andreas that the code in
        > AbstractLayoutManager should really be in BlockStackingLayoutManager. I can
        > faintly remember that I've once investigated the right time to release memory
        > in a layout manager. I think that was for page-number-citation-last. I'm not
        > sure that we already always know exactly when a layout manager is really
        > finished. Anyway, we've got a good test suite so any bugs in that area should
        > quickly be found.

        I was working on this patch in October, I remember that the automated test on the area generated was failing because the check was based on the freed area... Anyway I've never encountered problems in my typical usage, fop just prints this warning when updating the area:

        11-mar-2009 9.24.58 org.apache.fop.area.inline.UnresolvedPageNumber resolveIDRef
        AVVERTENZA: Cannot update the IPD of an unresolved page number. No font information available.

        But the final result is fine.

        >
        > @Andreas: what does "beneficial" mean? Beneficial in terms of memory
        > consumption or of speed? I think that can be mutually exclusive. Before we
        > decide to enable this by default I would want to have some numbers (performance
        > & memory usage) to base the decision upon.

        I think that area caching shouldn't be the default option: it provides a significant memory improvement for some cases (large documents with many page-seq plus page citation) but in most cases it would just slowdown the process (disk access for read/write) without any benefit.
        It is useful in my scenario where I produce 50000 pages document, that means 400-500 MB of RAM and 4 GB of disk cache (that otherwise should reside in memory...).
        For completeness, if I just want the total page number I can achieve better memory results by using the hack of the two pass indicated in http://xmlgraphics.apache.org/fop/fo.html#fo-total-pages, but it is slower. Theoretically it should be twice slower, but in practice the update process plus the disk access reduces the time gap.
        Anyway I think that the two pass method can't work if you have many cross reference between page-seqs and not only the total page number.

        >
        > Another comment to the patch: I'm not particularly happy about calling the
        > temporary area tree unloading to disk simply "the cache". We have a font cache,
        > an image cache and others. I'm not even sure if this is really a classic
        > "cache" as such. We need a better name for this IMO. How about
        > FOUserAgent.setTemporaryAreaTreeSerializationEnabled(boolean)? Or
        > FOUserAgent.setDiskBasedRenderPagesModel(boolean)? A bit lengthy but at least
        > more speaking. Or FOUserAgent.setConserveMemoryPolicy(boolean) which would be a
        > less specific approach that other code parts also could use as hint?

        I agree, "cache" is inappropriate, "setConserveMemoryPolicy" is nice and blink the eye to non fop-dev that just want to embed fop.

        Show
        Laera Dario added a comment - (In reply to comment #3) > Thanks for the patch, Dario. I agree with Andreas that the code in > AbstractLayoutManager should really be in BlockStackingLayoutManager. I can > faintly remember that I've once investigated the right time to release memory > in a layout manager. I think that was for page-number-citation-last. I'm not > sure that we already always know exactly when a layout manager is really > finished. Anyway, we've got a good test suite so any bugs in that area should > quickly be found. I was working on this patch in October, I remember that the automated test on the area generated was failing because the check was based on the freed area... Anyway I've never encountered problems in my typical usage, fop just prints this warning when updating the area: 11-mar-2009 9.24.58 org.apache.fop.area.inline.UnresolvedPageNumber resolveIDRef AVVERTENZA: Cannot update the IPD of an unresolved page number. No font information available. But the final result is fine. > > @Andreas: what does "beneficial" mean? Beneficial in terms of memory > consumption or of speed? I think that can be mutually exclusive. Before we > decide to enable this by default I would want to have some numbers (performance > & memory usage) to base the decision upon. I think that area caching shouldn't be the default option: it provides a significant memory improvement for some cases (large documents with many page-seq plus page citation) but in most cases it would just slowdown the process (disk access for read/write) without any benefit. It is useful in my scenario where I produce 50000 pages document, that means 400-500 MB of RAM and 4 GB of disk cache (that otherwise should reside in memory...). For completeness, if I just want the total page number I can achieve better memory results by using the hack of the two pass indicated in http://xmlgraphics.apache.org/fop/fo.html#fo-total-pages , but it is slower. Theoretically it should be twice slower, but in practice the update process plus the disk access reduces the time gap. Anyway I think that the two pass method can't work if you have many cross reference between page-seqs and not only the total page number. > > Another comment to the patch: I'm not particularly happy about calling the > temporary area tree unloading to disk simply "the cache". We have a font cache, > an image cache and others. I'm not even sure if this is really a classic > "cache" as such. We need a better name for this IMO. How about > FOUserAgent.setTemporaryAreaTreeSerializationEnabled(boolean)? Or > FOUserAgent.setDiskBasedRenderPagesModel(boolean)? A bit lengthy but at least > more speaking. Or FOUserAgent.setConserveMemoryPolicy(boolean) which would be a > less specific approach that other code parts also could use as hint? I agree, "cache" is inappropriate, "setConserveMemoryPolicy" is nice and blink the eye to non fop-dev that just want to embed fop.
        Hide
        Andreas L. Delmelle added a comment -

        Having read the feedback, I agree with initially setting it to false. I completely overlooked the fact that we get increased disk I/O as a result.

        With respect to LM-cleanup (maybe interesting to know):
        I did make some changes (months ago) which release the FO subtree when an LM has finished adding its areas. It does not work yet in all cases, only for straightforward blocks and inlines. Most notably: it does not work for table-elements, and there are some cases with nested blocks and border-after or space-after where it doesn't work either.
        Reason is that the approach relies on the isLast() method. Sometimes it returns false for the parent BlockLM, because the checked Position actually belongs to the childLM, and the real last position for the parent LM is not available in the PositionIterator that is passed to addAreas().

        Show
        Andreas L. Delmelle added a comment - Having read the feedback, I agree with initially setting it to false. I completely overlooked the fact that we get increased disk I/O as a result. With respect to LM-cleanup (maybe interesting to know): I did make some changes (months ago) which release the FO subtree when an LM has finished adding its areas. It does not work yet in all cases, only for straightforward blocks and inlines. Most notably: it does not work for table-elements, and there are some cases with nested blocks and border-after or space-after where it doesn't work either. Reason is that the approach relies on the isLast() method. Sometimes it returns false for the parent BlockLM, because the checked Position actually belongs to the childLM, and the real last position for the parent LM is not available in the PositionIterator that is passed to addAreas().
        Hide
        Andreas L. Delmelle added a comment -

        OK, I'm about ready to commit.
        One thing I'm slightly unsure of: current FOP Trunk has the call to notifyEndOfLayout() for the TableCellLM at the end of the getNextKnuthElements() method. For virtually all other LMs, this call is triggered at the end of addAreas(). Does anyone know if moving that call to addAreas() could cause trouble? It breaks no tests, but I'm wondering if it was intentional or something that was forgotten and nobody ever noticed.

        Apart from that, is everyone OK with using:
        member: conserveMemoryPolicy
        getter: isConserveMemoryPolicyEnabled()
        setter: setConserveMemoryPolicy(boolean)

        -> via the command-line: -conserve? or something more cryptic, like -cmp?

        Show
        Andreas L. Delmelle added a comment - OK, I'm about ready to commit. One thing I'm slightly unsure of: current FOP Trunk has the call to notifyEndOfLayout() for the TableCellLM at the end of the getNextKnuthElements() method. For virtually all other LMs, this call is triggered at the end of addAreas(). Does anyone know if moving that call to addAreas() could cause trouble? It breaks no tests, but I'm wondering if it was intentional or something that was forgotten and nobody ever noticed. Apart from that, is everyone OK with using: member: conserveMemoryPolicy getter: isConserveMemoryPolicyEnabled() setter: setConserveMemoryPolicy(boolean) -> via the command-line: -conserve? or something more cryptic, like -cmp?
        Hide
        Andreas L. Delmelle added a comment -

        Attachment Bugzilla46828.patch has been added with description: Revised patch proposal

        Show
        Andreas L. Delmelle added a comment - Attachment Bugzilla46828.patch has been added with description: Revised patch proposal
        Hide
        Andreas L. Delmelle added a comment -


        Just noticed: the patch also contains some minor cleanup in BlockStackingLM (removals of redundant casts and simplification of conditionals). Hope this raises no objection...

        Show
        Andreas L. Delmelle added a comment - Just noticed: the patch also contains some minor cleanup in BlockStackingLM (removals of redundant casts and simplification of conditionals). Hope this raises no objection...
        Hide
        Jeremias Maerki added a comment -

        (In reply to comment #6)
        > Created an attachment (id=23375) [details]
        > Revised patch proposal
        >
        > OK, I'm about ready to commit.

        Small omission: the "-cache" in CommandLineOptions needs to be deleted. Otherwise, this looks ok, but I haven't done any tests myself.

        > One thing I'm slightly unsure of: current FOP Trunk has the call to
        > notifyEndOfLayout() for the TableCellLM at the end of the
        > getNextKnuthElements() method. For virtually all other LMs, this call is
        > triggered at the end of addAreas(). Does anyone know if moving that call to
        > addAreas() could cause trouble? It breaks no tests, but I'm wondering if it was
        > intentional or something that was forgotten and nobody ever noticed.

        notifyEndOfLayout() in getNextKnuthElements() would be the mistake, not the move. If the tests run through ok, you should be fine. Otherwise, we are missing a test.

        > Apart from that, is everyone OK with using:
        > member: conserveMemoryPolicy
        > getter: isConserveMemoryPolicyEnabled()
        > setter: setConserveMemoryPolicy(boolean)

        /me nods

        > -> via the command-line: -conserve? or something more cryptic, like -cmp?

        -conserve is fine IMO.

        Show
        Jeremias Maerki added a comment - (In reply to comment #6) > Created an attachment (id=23375) [details] > Revised patch proposal > > OK, I'm about ready to commit. Small omission: the "-cache" in CommandLineOptions needs to be deleted. Otherwise, this looks ok, but I haven't done any tests myself. > One thing I'm slightly unsure of: current FOP Trunk has the call to > notifyEndOfLayout() for the TableCellLM at the end of the > getNextKnuthElements() method. For virtually all other LMs, this call is > triggered at the end of addAreas(). Does anyone know if moving that call to > addAreas() could cause trouble? It breaks no tests, but I'm wondering if it was > intentional or something that was forgotten and nobody ever noticed. notifyEndOfLayout() in getNextKnuthElements() would be the mistake, not the move. If the tests run through ok, you should be fine. Otherwise, we are missing a test. > Apart from that, is everyone OK with using: > member: conserveMemoryPolicy > getter: isConserveMemoryPolicyEnabled() > setter: setConserveMemoryPolicy(boolean) /me nods > -> via the command-line: -conserve? or something more cryptic, like -cmp? -conserve is fine IMO.
        Hide
        Laera Dario added a comment -

        (In reply to comment #8)
        > > One thing I'm slightly unsure of: current FOP Trunk has the call to
        > > notifyEndOfLayout() for the TableCellLM at the end of the
        > > getNextKnuthElements() method. For virtually all other LMs, this call is
        > > triggered at the end of addAreas(). Does anyone know if moving that call to
        > > addAreas() could cause trouble? It breaks no tests, but I'm wondering if it was
        > > intentional or something that was forgotten and nobody ever noticed.
        >
        > notifyEndOfLayout() in getNextKnuthElements() would be the mistake, not the
        > move. If the tests run through ok, you should be fine. Otherwise, we are
        > missing a test.

        Alas, I'm no more able to comment this kind of details, I lost some of my knowledge in few months

        I agree about other comments

        Show
        Laera Dario added a comment - (In reply to comment #8) > > One thing I'm slightly unsure of: current FOP Trunk has the call to > > notifyEndOfLayout() for the TableCellLM at the end of the > > getNextKnuthElements() method. For virtually all other LMs, this call is > > triggered at the end of addAreas(). Does anyone know if moving that call to > > addAreas() could cause trouble? It breaks no tests, but I'm wondering if it was > > intentional or something that was forgotten and nobody ever noticed. > > notifyEndOfLayout() in getNextKnuthElements() would be the mistake, not the > move. If the tests run through ok, you should be fine. Otherwise, we are > missing a test. Alas, I'm no more able to comment this kind of details, I lost some of my knowledge in few months I agree about other comments
        Hide
        Andreas L. Delmelle added a comment -

        Changes committed with r753327.

        Thanks for submitting the patch, Daerio!

        Show
        Andreas L. Delmelle added a comment - Changes committed with r753327. Thanks for submitting the patch, Daerio!
        Hide
        Glenn Adams added a comment -

        batch transition pre-FOP1.0 resolved+fixed bugs to closed+fixed

        Show
        Glenn Adams added a comment - batch transition pre-FOP1.0 resolved+fixed bugs to closed+fixed

          People

          • Assignee:
            fop-dev
            Reporter:
            Laera Dario
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development