FOP
  1. FOP
  2. FOP-1271

[PATCH] FOP memory usage patches.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Resolution: Fixed
    • Affects Version/s: trunk
    • Fix Version/s: None
    • Component/s: fo/unqualified
    • Labels:
      None
    • Environment:
      Operating System: All
      Platform: Other
    • External issue ID:
      41044

      Description

      As discussed on the dev list, the fo tree memory usage is somewhat excessive due
      mainly to a large number of properties objects being generated. A number of
      patches should be added here as these issues are slowly addressed.

      1. rsw_enumprop.patch
        19 kB
        Richard Wheeldon
      2. rsw_delprops1.patch
        84 kB
        Richard Wheeldon
      3. marker-attribute-opt.diff
        6 kB
        Andreas L. Delmelle
      4. marker-attribute-opt.diff
        6 kB
        Andreas L. Delmelle
      5. rsw_delprops2.patch
        116 kB
        Richard Wheeldon
      6. rsw_delprops3.patch
        67 kB
        Richard Wheeldon
      7. rsw_delprops4.patch
        87 kB
        Richard Wheeldon

        Activity

        Hide
        Glenn Adams added a comment -

        batch transition pre-FOP1.0 resolved+fixed bugs to closed+fixed

        Show
        Glenn Adams added a comment - batch transition pre-FOP1.0 resolved+fixed bugs to closed+fixed
        Hide
        Andreas L. Delmelle added a comment -

        Basics of the patch have been applied, with some modifications:

        • removed usage of the cache by the PropertyList and Common*-bundles
        • applied caching to a restricted set of Property types that can be safely canonicalized

        see: http://svn.apache.org/viewvc?view=rev&rev=554251

        Closing this one. For any improvements, just create a new report.

        Show
        Andreas L. Delmelle added a comment - Basics of the patch have been applied, with some modifications: removed usage of the cache by the PropertyList and Common*-bundles applied caching to a restricted set of Property types that can be safely canonicalized see: http://svn.apache.org/viewvc?view=rev&rev=554251 Closing this one. For any improvements, just create a new report.
        Hide
        Andreas L. Delmelle added a comment -

        Some more brainstorming about correcting the ideas:

        Make the PropertyCache more than a simple wrapper around a WeakHashMap, and instead of giving
        each Property type its own cache, make the Map multi-dimensional (or an array of WeakHashMaps?)
        The base Map would be, for example, a mapping of Class objects to a WeakHashMap, which would then
        contain the canonical instances for a particular Property type.

        Something like:

        public final class PropertyCache {

        private static Map typeCache = new HashMap();
        private static Map propCache;

        /**

        • Checks if the given property is present in the cache - if so, returns
        • a reference to the cached value. Otherwise the given object is added
        • to the cache and returned.
        • @param obj
        • @return the cached instance
          */
          public static synchronized Property fetch(Property prop) {

        Class propType = prop.getClass();
        propCache = (Map) typeCache.get(propType);
        if (propCache == null)

        { propCache = new WeakHashMap(); propCache.put(prop, prop); typeCache.put(propType, propCache); return prop; }

        else {
        Property cacheEntry = (Property) propCache.get(prop);
        if (cacheEntry != null)

        { return cacheEntry; }

        else

        { propCache.put(prop, prop); return prop; }

        }
        }
        }

        The PropertyMakers, in their turn, will use this cache whenever they receive a Property from the
        PropertyParser that can be safely canonicalized (absolute lengths, enums, strings, ...). If the Property
        they receive from the parser is a relative one, they bypass the cache and make sure to return the new
        instance.

        Show
        Andreas L. Delmelle added a comment - Some more brainstorming about correcting the ideas: Make the PropertyCache more than a simple wrapper around a WeakHashMap, and instead of giving each Property type its own cache, make the Map multi-dimensional (or an array of WeakHashMaps?) The base Map would be, for example, a mapping of Class objects to a WeakHashMap, which would then contain the canonical instances for a particular Property type. Something like: public final class PropertyCache { private static Map typeCache = new HashMap(); private static Map propCache; /** Checks if the given property is present in the cache - if so, returns a reference to the cached value. Otherwise the given object is added to the cache and returned. @param obj @return the cached instance */ public static synchronized Property fetch(Property prop) { Class propType = prop.getClass(); propCache = (Map) typeCache.get(propType); if (propCache == null) { propCache = new WeakHashMap(); propCache.put(prop, prop); typeCache.put(propType, propCache); return prop; } else { Property cacheEntry = (Property) propCache.get(prop); if (cacheEntry != null) { return cacheEntry; } else { propCache.put(prop, prop); return prop; } } } } The PropertyMakers, in their turn, will use this cache whenever they receive a Property from the PropertyParser that can be safely canonicalized (absolute lengths, enums, strings, ...). If the Property they receive from the parser is a relative one, they bypass the cache and make sure to return the new instance.
        Hide
        Andreas L. Delmelle added a comment -

        More info:
        Reverting the change in the usage of the PropertyCache by PropertyList solves 11 of the 70 errors I
        initially encountered. The others are related to the Common* property bundles that include properties
        that can be percentages (padding, margin...)

        The idea is right, but the implementation is slightly in error. Most likely due to the inherent complexity
        of the properties package...
        Once you get it, it's really simple actually:
        Every Property class has its own particular Maker that deals with converting an attribute value to the
        correct type of Property instance. FOPropertyMapping initializes a cache of those makers, one for each
        property defined in the XSL-FO Rec. (still limited to 1.0)
        The PropertyList deals with the entire attribute list for a single FO. Given an Attributes instance, it
        extracts a few properties first -font-size, column-number- because of possible dependencies in
        other properties --em-lengths, from-table-column(). All the others are looped over: fetch Maker and
        call Maker.make(), basically, which triggers the PropertyParser to parse the expression and return a
        generic Property instance, possibly a compound or a list of Property instances...

        Thinking some more about the general approach, the pseudo-singleton pattern by itself is not enough
        here (private constructors and a getInstance()).
        I'm thinking of also weaving some of it into the Maker-logic. Have the Makers bear the responsibility
        that no necessary duplicates are missing (as is the case for the LengthBase of "50%" for instance, after
        applying the patch)
        Also, I'm going to set aside the idea about creating canonical Common* bundles FTM, apart from the
        quite common "no border, no padding, no background" scenario. The padding length-conditionals and
        BorderInfos are already difficult enough to handle. If the border, padding and background properties by
        themselves are guaranteed to be canonical, then a lot of the 'duplicate' bundles will be nothing but
        wrappers around a bunch of references to the very same BorderInfos, absolute padding-widths, etc.
        That's already bound to save a lot.

        There seems to be an inherent error, not only WRT percentages, but more generally any relative value. If
        I get the drift of the patch correctly, it creates a canonical Property instance for the attribute:
        width="2em"

        Meaning: for a LengthProperty with an explicitly specified value of "2em"...? What if we then later on
        encounter that same attribute value on another, very likely unrelated node?

        Show
        Andreas L. Delmelle added a comment - More info: Reverting the change in the usage of the PropertyCache by PropertyList solves 11 of the 70 errors I initially encountered. The others are related to the Common* property bundles that include properties that can be percentages (padding, margin...) The idea is right, but the implementation is slightly in error. Most likely due to the inherent complexity of the properties package... Once you get it, it's really simple actually: Every Property class has its own particular Maker that deals with converting an attribute value to the correct type of Property instance. FOPropertyMapping initializes a cache of those makers, one for each property defined in the XSL-FO Rec. (still limited to 1.0) The PropertyList deals with the entire attribute list for a single FO. Given an Attributes instance, it extracts a few properties first - font-size, column-number - because of possible dependencies in other properties --em-lengths, from-table-column(). All the others are looped over: fetch Maker and call Maker.make(), basically, which triggers the PropertyParser to parse the expression and return a generic Property instance, possibly a compound or a list of Property instances... Thinking some more about the general approach, the pseudo-singleton pattern by itself is not enough here (private constructors and a getInstance()). I'm thinking of also weaving some of it into the Maker-logic. Have the Makers bear the responsibility that no necessary duplicates are missing (as is the case for the LengthBase of "50%" for instance, after applying the patch) Also, I'm going to set aside the idea about creating canonical Common* bundles FTM, apart from the quite common "no border, no padding, no background" scenario. The padding length-conditionals and BorderInfos are already difficult enough to handle. If the border, padding and background properties by themselves are guaranteed to be canonical, then a lot of the 'duplicate' bundles will be nothing but wrappers around a bunch of references to the very same BorderInfos, absolute padding-widths, etc. That's already bound to save a lot. There seems to be an inherent error, not only WRT percentages, but more generally any relative value. If I get the drift of the patch correctly, it creates a canonical Property instance for the attribute: width="2em" Meaning: for a LengthProperty with an explicitly specified value of "2em"...? What if we then later on encounter that same attribute value on another, very likely unrelated node?
        Hide
        Andreas L. Delmelle added a comment -

        Sorry, the block should precisely only be entered when it is not auto.

        Another dead end

        Show
        Andreas L. Delmelle added a comment - Sorry, the block should precisely only be entered when it is not auto. Another dead end
        Hide
        Andreas L. Delmelle added a comment -

        Status update: something really weird seems to be going on here.
        Continuing the tests with 'block-container_content_size_percentage.xml'.

        I placed a breakpoint around where the error originates ( AbstractBaseLM, line 89 ), then on each break
        checked the call stack, and noticed that for all nested block-containers except the first one, the stack
        points back to BlockContainerLM line 218, which is inside a conditional block that should only be entered
        if width="auto"

        The search continues...

        Show
        Andreas L. Delmelle added a comment - Status update: something really weird seems to be going on here. Continuing the tests with 'block-container_content_size_percentage.xml'. I placed a breakpoint around where the error originates ( AbstractBaseLM, line 89 ), then on each break checked the call stack, and noticed that for all nested block-containers except the first one, the stack points back to BlockContainerLM line 218, which is inside a conditional block that should only be entered if width="auto" The search continues...
        Hide
        Andreas L. Delmelle added a comment -

        Hi Richard,

        I've finally found some time to finish reviewing your patch, and it looked good at first glance, apart
        from some style issues (javadocs, bracing conditional branches even if there's only one statement,
        blabla) and what I take to be some small typos (for example: in CommonHyphenation.equals(), you
        compared the country, language and script properties to themselves instead of to their counterparts
        from the other instance).

        Unfortunately, when I ran the junit tests, there seems to be a lot of work before we can apply it as is... I
        get 70 errors. :/ (a lot of them NPEs)
        Am I doing something wrong, or did you skip the testsuite?

        The only thing I really threw out were the deprecated-tags from CompoundDataType and
        LengthRangeProperty. This does need some further thought, nevertheless.
        For CommonBorderPaddingBackground.setBorderInfo(), I agreed. So much even that I removed
        setBorderInfo() and setPadding() completely, instead of merely deprecating.

        For now, I'll offer a bit more info on the rationale behind the mutability of compounds, and an idea on
        how to tackle it.
        Compounds would be tricky in terms of canonicalization, because, in order to judge which compound
        instance you need, you have to have all the components available (e.g.: for space-before you need
        the .minimum, .maximum, .optimum, .conditionality and .precedence components).
        This means that, currently, in order to correctly canonicalize a compound, we would have to wait until
        all attributes for a given FO have been converted to properties. We cannot do so at the time the
        compound is first created, since it will at that point only have the specified value of /one/ of the
        components (the first one in the list)

        The sole reason why they are mutable is that they are not created in one go, but rather, the base-
        property is created once the first component is encountered in the attribute-list. Further on in the
        process, as the other components are converted, the PropertyMakers need to be able to change the
        base-property (through setComponent())

        A possible solution I'm thinking of is to rewrite part of the convertAttributeToProperty-loop in
        PropertyList. If we can make sure that compounds are always created as a single compound property
        (instead of the multiple components each setting the component of a base-property), then they would
        not need to be mutable at all, and the issue is rendered moot.
        FWIW, I've always disliked a bit the fact that the attributes are simply looped over. The hacks a bit
        higher up, in addAttributesToList(), for extracting column-number and font-size before all others, are
        to me an indication that this should be revisited.

        As I roughly see it ATM, a compound would be made/instantiated using a list of attributes, instead of
        only one. As soon as one component is encountered, we would immediately scan the Attributes,
        extracting all the other components, and can then pass them all into the compound's PropertyMaker. As
        such, there is no more need for the setComponent() methods, since by converting one component's
        attribute, we would immediately convert the whole compound.

        WDYT?

        Andreas

        Show
        Andreas L. Delmelle added a comment - Hi Richard, I've finally found some time to finish reviewing your patch, and it looked good at first glance, apart from some style issues (javadocs, bracing conditional branches even if there's only one statement, blabla) and what I take to be some small typos (for example: in CommonHyphenation.equals(), you compared the country, language and script properties to themselves instead of to their counterparts from the other instance). Unfortunately, when I ran the junit tests, there seems to be a lot of work before we can apply it as is... I get 70 errors. :/ (a lot of them NPEs) Am I doing something wrong, or did you skip the testsuite? The only thing I really threw out were the deprecated-tags from CompoundDataType and LengthRangeProperty. This does need some further thought, nevertheless. For CommonBorderPaddingBackground.setBorderInfo(), I agreed. So much even that I removed setBorderInfo() and setPadding() completely, instead of merely deprecating. For now, I'll offer a bit more info on the rationale behind the mutability of compounds, and an idea on how to tackle it. Compounds would be tricky in terms of canonicalization, because, in order to judge which compound instance you need, you have to have all the components available (e.g.: for space-before you need the .minimum, .maximum, .optimum, .conditionality and .precedence components). This means that, currently, in order to correctly canonicalize a compound, we would have to wait until all attributes for a given FO have been converted to properties. We cannot do so at the time the compound is first created, since it will at that point only have the specified value of /one/ of the components (the first one in the list) The sole reason why they are mutable is that they are not created in one go, but rather, the base- property is created once the first component is encountered in the attribute-list. Further on in the process, as the other components are converted, the PropertyMakers need to be able to change the base-property (through setComponent()) A possible solution I'm thinking of is to rewrite part of the convertAttributeToProperty-loop in PropertyList. If we can make sure that compounds are always created as a single compound property (instead of the multiple components each setting the component of a base-property), then they would not need to be mutable at all, and the issue is rendered moot. FWIW, I've always disliked a bit the fact that the attributes are simply looped over. The hacks a bit higher up, in addAttributesToList(), for extracting column-number and font-size before all others, are to me an indication that this should be revisited. As I roughly see it ATM, a compound would be made/instantiated using a list of attributes, instead of only one. As soon as one component is encountered, we would immediately scan the Attributes, extracting all the other components, and can then pass them all into the compound's PropertyMaker. As such, there is no more need for the setComponent() methods, since by converting one component's attribute, we would immediately convert the whole compound. WDYT? Andreas
        Hide
        Richard Wheeldon added a comment -

        Attachment rsw_delprops4.patch has been added with description: Updated patch against svn head.

        Show
        Richard Wheeldon added a comment - Attachment rsw_delprops4.patch has been added with description: Updated patch against svn head.
        Hide
        Richard Wheeldon added a comment -

        This looks about as good as it's going to get for the moment. The TXTHandler
        issues turned out not to be relevant - nothing seems to be accessing this code
        and simply removing it (as this patch does) affects nothing else insofar as I
        can tell. I'd be interested in any feedback on this. Unless this patch breaks
        something I haven't accounted for, I'd suggest applying it to trunk as-is.

        Show
        Richard Wheeldon added a comment - This looks about as good as it's going to get for the moment. The TXTHandler issues turned out not to be relevant - nothing seems to be accessing this code and simply removing it (as this patch does) affects nothing else insofar as I can tell. I'd be interested in any feedback on this. Unless this patch breaks something I haven't accounted for, I'd suggest applying it to trunk as-is.
        Show
        Richard Wheeldon added a comment - See also http://issues.apache.org/bugzilla/show_bug.cgi?id=41656
        Hide
        Richard Wheeldon added a comment -

        Attachment rsw_delprops3.patch has been added with description: Updated properties patch

        Show
        Richard Wheeldon added a comment - Attachment rsw_delprops3.patch has been added with description: Updated properties patch
        Hide
        Richard Wheeldon added a comment -

        Relative to revision 496642. Includes previous changes and further
        modifications to CommonXXX and XXXProperty classes. Breaks TXTHandler and will
        affect any other code which relies on Mutable properties. For the most part,
        the patch consists of adding hashCode() and equals() methods and replacing
        constructor calls with calls to static getInstance methods.

        Show
        Richard Wheeldon added a comment - Relative to revision 496642. Includes previous changes and further modifications to CommonXXX and XXXProperty classes. Breaks TXTHandler and will affect any other code which relies on Mutable properties. For the most part, the patch consists of adding hashCode() and equals() methods and replacing constructor calls with calls to static getInstance methods.
        Hide
        Simon Pepping added a comment -

        Patch 19177 applied. Thanks.

        Show
        Simon Pepping added a comment - Patch 19177 applied. Thanks.
        Hide
        Simon Pepping added a comment -

        Also included in the following attachment

        Show
        Simon Pepping added a comment - Also included in the following attachment
        Hide
        Richard Wheeldon added a comment -

        Attachment rsw_delprops2.patch has been added with description: More getInstance() replacements, final property members, etc.

        Show
        Richard Wheeldon added a comment - Attachment rsw_delprops2.patch has been added with description: More getInstance() replacements, final property members, etc.
        Hide
        Richard Wheeldon added a comment -

        Reduces memory usage, but breaks TXTHandler. Covers all properties not
        implementing CompoundDatatype. Further thought is needed regarding
        CompoundDatatype examples and other performance issues.

        Show
        Richard Wheeldon added a comment - Reduces memory usage, but breaks TXTHandler. Covers all properties not implementing CompoundDatatype. Further thought is needed regarding CompoundDatatype examples and other performance issues.
        Hide
        Andreas L. Delmelle added a comment -

        (In reply to comment #11)

        Thanks for the info, Richard.

        Just a minor clarification:
        > 3. String.intern() does require an existing String.

        Obviously, explicit internalization does, but what I was referring to earlier is the difference between
        explicit (run-time) and implicit (compile-time) internalization: in the latter case, I'm pretty sure a separate
        instance never gets created...

        Show
        Andreas L. Delmelle added a comment - (In reply to comment #11) Thanks for the info, Richard. Just a minor clarification: > 3. String.intern() does require an existing String. Obviously, explicit internalization does, but what I was referring to earlier is the difference between explicit (run-time) and implicit (compile-time) internalization: in the latter case, I'm pretty sure a separate instance never gets created...
        Hide
        Richard Wheeldon added a comment -

        Some notes regarding the comments so far:
        1. An immutable class is guaranteed to be thread safe. I'm aiming at changing
        all property classes to be this way, as I believe they were intended to be.
        2. Constructing lots of short lived objects and comparing them against a few
        long lived ones is much quicker than than creating lots of long lived objects.
        If we were writing in C this would not always be the case - there would be no
        tree traversal and an equal number of malloc/frees. Given that this is Java, we
        reduce the number of GC passes and allow the newer collectors to work more
        efficiently.
        3. String.intern() does require an existing String.
        4. The location of PropertyCache and the cached values can be changed later if
        required.

        Show
        Richard Wheeldon added a comment - Some notes regarding the comments so far: 1. An immutable class is guaranteed to be thread safe. I'm aiming at changing all property classes to be this way, as I believe they were intended to be. 2. Constructing lots of short lived objects and comparing them against a few long lived ones is much quicker than than creating lots of long lived objects. If we were writing in C this would not always be the case - there would be no tree traversal and an equal number of malloc/frees. Given that this is Java, we reduce the number of GC passes and allow the newer collectors to work more efficiently. 3. String.intern() does require an existing String. 4. The location of PropertyCache and the cached values can be changed later if required.
        Hide
        Andreas L. Delmelle added a comment -

        (In reply to comment #9)
        > Finally got around to performing the measurements, and I was wrong about this. Using Maps is much
        > faster, up to about five times faster with value-lists as small as 10 elements... Richard's is definitely the
        > way to go.

        As a consequence of these results, I adapted my change in Marker.java to use Maps instead of Lists, and
        committed it: http://svn.apache.org/viewvc?view=rev&rev=487972

        Show
        Andreas L. Delmelle added a comment - (In reply to comment #9) > Finally got around to performing the measurements, and I was wrong about this. Using Maps is much > faster, up to about five times faster with value-lists as small as 10 elements... Richard's is definitely the > way to go. As a consequence of these results, I adapted my change in Marker.java to use Maps instead of Lists, and committed it: http://svn.apache.org/viewvc?view=rev&rev=487972
        Hide
        Andreas L. Delmelle added a comment -

        (In reply to comment #8)
        > ... I'd even go as far as stating that the list iteration will perform roughly the same as a
        > map lookup. All we lose by using a List is the extra space needed for the Map.Entrys...

        Just FYI:
        Finally got around to performing the measurements, and I was wrong about this. Using Maps is much
        faster, up to about five times faster with value-lists as small as 10 elements... Richard's is definitely the
        way to go.

        Show
        Andreas L. Delmelle added a comment - (In reply to comment #8) > ... I'd even go as far as stating that the list iteration will perform roughly the same as a > map lookup. All we lose by using a List is the extra space needed for the Map.Entrys... Just FYI: Finally got around to performing the measurements, and I was wrong about this. Using Maps is much faster, up to about five times faster with value-lists as small as 10 elements... Richard's is definitely the way to go.
        Hide
        Andreas L. Delmelle added a comment -

        (In reply to comment #7)

        > Andreas, why is your valueCache not also a map on the value?

        I thought of that too, but since we're trying to save on resources here I'm a bit wary of the extra objects
        a Map generates under-the-hood.
        ArrayLists have very low overhead in that respect. Very little more space needed on top of the instances
        it stores. Given that the number of distinct values for one attribute is expected to remain at a
        reasonable level, I'd even go as far as stating that the list iteration will perform roughly the same as a
        map lookup. All we lose by using a List is the extra space needed for the Map.Entrys...

        > I am not sure that your approach is worth the trouble.

        Indeed, see also Joerg's response on fop-dev.
        I was not either, but it would get interesting if such an approach could be used on the complex types.

        On the other hand, even if you start from a type containing only two String instance members, having
        thousands of references to one singular instance of that type is mathematically always better than
        creating thousands of instances with the two String references being identical.
        The benefit of Richard's approach is that it could rather easily be ported to all Property types. My
        approach seems more appropriate for the Attribute type. Looking closer, it would get very difficult to
        port it to the properties.

        > An EnumProperty and an Attribute object contain only a name and a value,
        > and two more strings in the latter case. It is quite efficient to use that as a key,
        > and creating an object for lookup is not very costly in view of the efficient hash
        > lookup you achieve.
        > First doing a lookup on name and then on value may cost more than it saves.

        Could be... I really should try to measure it.

        > Of course a map in which the key and the value are always identical seems
        > strange. I have looked into Set, which guarantees uniqueness of the objects, but
        > it does not allow efficient retrieval.

        Hence, index-based list-retrieval! 8)
        The uniqueness follows from the implementation, I think...
        If the Maps and Lists are properly synchronized, then there will always be only one distinct instance per
        name/value.
        Each get() following the initial put() will return that singular instance.

        > At first I was wary about Richard's commenting out of properties. In principle
        > every property has an effect on layout and must be looked up. But if we wish to
        > release a production version, we cannot waste performance on such a principle.
        > Therefore I think a careful commenting out of properties not looked up is useful.

        Agreed here. We can always reactivate them later.

        Show
        Andreas L. Delmelle added a comment - (In reply to comment #7) > Andreas, why is your valueCache not also a map on the value? I thought of that too, but since we're trying to save on resources here I'm a bit wary of the extra objects a Map generates under-the-hood. ArrayLists have very low overhead in that respect. Very little more space needed on top of the instances it stores. Given that the number of distinct values for one attribute is expected to remain at a reasonable level, I'd even go as far as stating that the list iteration will perform roughly the same as a map lookup. All we lose by using a List is the extra space needed for the Map.Entrys... > I am not sure that your approach is worth the trouble. Indeed, see also Joerg's response on fop-dev. I was not either, but it would get interesting if such an approach could be used on the complex types. On the other hand, even if you start from a type containing only two String instance members, having thousands of references to one singular instance of that type is mathematically always better than creating thousands of instances with the two String references being identical. The benefit of Richard's approach is that it could rather easily be ported to all Property types. My approach seems more appropriate for the Attribute type. Looking closer, it would get very difficult to port it to the properties. > An EnumProperty and an Attribute object contain only a name and a value, > and two more strings in the latter case. It is quite efficient to use that as a key, > and creating an object for lookup is not very costly in view of the efficient hash > lookup you achieve. > First doing a lookup on name and then on value may cost more than it saves. Could be... I really should try to measure it. > Of course a map in which the key and the value are always identical seems > strange. I have looked into Set, which guarantees uniqueness of the objects, but > it does not allow efficient retrieval. Hence, index-based list-retrieval! 8) The uniqueness follows from the implementation, I think... If the Maps and Lists are properly synchronized, then there will always be only one distinct instance per name/value. Each get() following the initial put() will return that singular instance. > At first I was wary about Richard's commenting out of properties. In principle > every property has an effect on layout and must be looked up. But if we wish to > release a production version, we cannot waste performance on such a principle. > Therefore I think a careful commenting out of properties not looked up is useful. Agreed here. We can always reactivate them later.
        Hide
        Simon Pepping added a comment -

        (In reply to comment #6)

        I think Richard's approach is OK for read-only properties.

        Andreas, why is your valueCache not also a map on the value?

        I am not sure that your approach is worth the trouble. An EnumProperty and an
        Attribute object contain only a name and a value, and two more strings in the
        latter case. It is quite efficient to use that as a key, and creating an object
        for lookup is not very costly in view of the efficient hash lookup you achieve.
        First doing a lookup on name and then on value may cost more than it saves.

        Of course a map in which the key and the value are always identical seems
        strange. I have looked into Set, which guarantees uniqueness of the objects, but
        it does not allow efficient retrieval.

        At first I was wary about Richard's commenting out of properties. In principle
        every property has an effect on layout and must be looked up. But if we wish to
        release a production version, we cannot waste performance on such a principle.
        Therefore I think a careful commenting out of properties not looked up is useful.

        Regards, Simon

        > Richard,
        >
        > (In reply to comment #1)
        > > Created an attachment (id=19176) [edit] [edit]
        > > Replace EnumProperty and EnumNumber constructors with getInstance()
        >
        > Looking again at your patch, one more remark:
        > The only thing that doesn't completely sit right with me, is the lookup based
        on an instance of the class
        > in question... This seems to me to defeat the purpose of reducing the
        instantiation rate. True, the
        > instance's scope remains limited to the getInstance() method, but they're
        created (and have to be GC'ed)
        > anyway :/
        >
        > See my proposed change in Marker.java: the MarkerAttribute constructor is
        called only when no
        > corresponding instance exists in the map yet. I'm still chewing on how to
        generalize such an approach
        > in the properties package.
        > In the meantime, also did something similar for fo.expr.NumericProperty...
        again nothing broke, but I
        > still have to dig up a testcase, to see how precisely this impacts the
        resource consumption before
        > committing such stuff.

        Show
        Simon Pepping added a comment - (In reply to comment #6) I think Richard's approach is OK for read-only properties. Andreas, why is your valueCache not also a map on the value? I am not sure that your approach is worth the trouble. An EnumProperty and an Attribute object contain only a name and a value, and two more strings in the latter case. It is quite efficient to use that as a key, and creating an object for lookup is not very costly in view of the efficient hash lookup you achieve. First doing a lookup on name and then on value may cost more than it saves. Of course a map in which the key and the value are always identical seems strange. I have looked into Set, which guarantees uniqueness of the objects, but it does not allow efficient retrieval. At first I was wary about Richard's commenting out of properties. In principle every property has an effect on layout and must be looked up. But if we wish to release a production version, we cannot waste performance on such a principle. Therefore I think a careful commenting out of properties not looked up is useful. Regards, Simon > Richard, > > (In reply to comment #1) > > Created an attachment (id=19176) [edit] [edit] > > Replace EnumProperty and EnumNumber constructors with getInstance() > > Looking again at your patch, one more remark: > The only thing that doesn't completely sit right with me, is the lookup based on an instance of the class > in question... This seems to me to defeat the purpose of reducing the instantiation rate. True, the > instance's scope remains limited to the getInstance() method, but they're created (and have to be GC'ed) > anyway :/ > > See my proposed change in Marker.java: the MarkerAttribute constructor is called only when no > corresponding instance exists in the map yet. I'm still chewing on how to generalize such an approach > in the properties package. > In the meantime, also did something similar for fo.expr.NumericProperty... again nothing broke, but I > still have to dig up a testcase, to see how precisely this impacts the resource consumption before > committing such stuff.
        Hide
        Andreas L. Delmelle added a comment -

        Richard,

        (In reply to comment #1)
        > Created an attachment (id=19176) [edit]
        > Replace EnumProperty and EnumNumber constructors with getInstance()

        Looking again at your patch, one more remark:
        The only thing that doesn't completely sit right with me, is the lookup based on an instance of the class
        in question... This seems to me to defeat the purpose of reducing the instantiation rate. True, the
        instance's scope remains limited to the getInstance() method, but they're created (and have to be GC'ed)
        anyway :/
        [Compare it to String internalization: I doubt the JVM creates a String instance first and throws it away
        afterwards if an internalized String for that text already exists. I'd think the extra instance is never even
        created...?]

        See my proposed change in Marker.java: the MarkerAttribute constructor is called only when no
        corresponding instance exists in the map yet. I'm still chewing on how to generalize such an approach
        in the properties package.
        In the meantime, also did something similar for fo.expr.NumericProperty... again nothing broke, but I
        still have to dig up a testcase, to see how precisely this impacts the resource consumption before
        committing such stuff.

        Example of the expected results of the change in Marker.java:
        suppose 200 Markers, one-element subtree with attribute-set of 10 attributes

        BEFORE:
        2000 separate MarkerAttribute instances w/ 4 String references each, good for 8000 Strings in toto.

        Case 1
        -> recurring identical attribute-set
        AFTER: 1 Map with 10 entries, 10 Strings plus 10 Lists consisting of one MarkerAttribute instance each,
        makes a total of only 10 instances (40 Strings) and 2000 references to one of those 10 instances.

        Case 2
        -> distinct attribute-sets (recurring names, different values)
        NEW: 1 Map with 10 entries, 10 Strings plus 10 Lists consisting of 200 MarkerAttribute instances each...
        Slight drawback because of the added Map + List instances, and the overhead of the lookup.

        The rest lies somewhere in between.
        The general use-cases would incline towards Case 1 more than Case 2 (or even a case where there are
        200 distinct attribute-sets that have absolutely no common attributes). The more distinct names, the
        larger the Map, but this is limited to PROPERTY_COUNT keys. The Lists can be expected to remain
        reasonably sized (practical limitations on the number of distinct possible values)

        If the synchronization works out as expected, a generalized approach for all Property subtypes could
        vastly reduce memory usage in multi-thread context (the above figures multiplied by a factor of 15, or
        more), especially if all the different threads use the same set of templates, generating heaps of identical
        attributes.

        Show
        Andreas L. Delmelle added a comment - Richard, (In reply to comment #1) > Created an attachment (id=19176) [edit] > Replace EnumProperty and EnumNumber constructors with getInstance() Looking again at your patch, one more remark: The only thing that doesn't completely sit right with me, is the lookup based on an instance of the class in question... This seems to me to defeat the purpose of reducing the instantiation rate. True, the instance's scope remains limited to the getInstance() method, but they're created (and have to be GC'ed) anyway :/ [Compare it to String internalization: I doubt the JVM creates a String instance first and throws it away afterwards if an internalized String for that text already exists. I'd think the extra instance is never even created...?] See my proposed change in Marker.java: the MarkerAttribute constructor is called only when no corresponding instance exists in the map yet. I'm still chewing on how to generalize such an approach in the properties package. In the meantime, also did something similar for fo.expr.NumericProperty... again nothing broke, but I still have to dig up a testcase, to see how precisely this impacts the resource consumption before committing such stuff. Example of the expected results of the change in Marker.java: suppose 200 Markers, one-element subtree with attribute-set of 10 attributes BEFORE: 2000 separate MarkerAttribute instances w/ 4 String references each, good for 8000 Strings in toto. Case 1 -> recurring identical attribute-set AFTER: 1 Map with 10 entries, 10 Strings plus 10 Lists consisting of one MarkerAttribute instance each, makes a total of only 10 instances (40 Strings) and 2000 references to one of those 10 instances. Case 2 -> distinct attribute-sets (recurring names, different values) NEW: 1 Map with 10 entries, 10 Strings plus 10 Lists consisting of 200 MarkerAttribute instances each... Slight drawback because of the added Map + List instances, and the overhead of the lookup. The rest lies somewhere in between. The general use-cases would incline towards Case 1 more than Case 2 (or even a case where there are 200 distinct attribute-sets that have absolutely no common attributes). The more distinct names, the larger the Map, but this is limited to PROPERTY_COUNT keys. The Lists can be expected to remain reasonably sized (practical limitations on the number of distinct possible values) If the synchronization works out as expected, a generalized approach for all Property subtypes could vastly reduce memory usage in multi-thread context (the above figures multiplied by a factor of 15, or more), especially if all the different threads use the same set of templates, generating heaps of identical attributes.
        Hide
        Andreas L. Delmelle added a comment -

        Attachment marker-attribute-opt.diff has been added with description: Correction...

        Show
        Andreas L. Delmelle added a comment - Attachment marker-attribute-opt.diff has been added with description: Correction...
        Hide
        Andreas L. Delmelle added a comment -

        Just realized there was still a slight error in the previous patch that kind of
        defeated the whole purpose...

        Show
        Andreas L. Delmelle added a comment - Just realized there was still a slight error in the previous patch that kind of defeated the whole purpose...
        Hide
        Andreas L. Delmelle added a comment -

        Attachment marker-attribute-opt.diff has been added with description: Patch for a similar optimization in Marker.java

        Show
        Andreas L. Delmelle added a comment - Attachment marker-attribute-opt.diff has been added with description: Patch for a similar optimization in Marker.java
        Hide
        Andreas L. Delmelle added a comment -

        I wonder... especially how this would impact multi-thread processing (see
        synchronization)

        Seems to break nothing, but I currently have no good testcases to be able to
        judge whether this would mean an improvement. Purely from an aesthetical point
        of view, it looks better, though...

        Opinions?

        Show
        Andreas L. Delmelle added a comment - I wonder... especially how this would impact multi-thread processing (see synchronization) Seems to break nothing, but I currently have no good testcases to be able to judge whether this would mean an improvement. Purely from an aesthetical point of view, it looks better, though... Opinions?
        Hide
        Andreas L. Delmelle added a comment -

        Richard,

        Nice job, so far!

        Been pondering a bit more about this myself, and thought up a few things to keep in mind in the
        process:

        1) In some cases a Maker uses instance methods of the FONode for which the property is being created.
        Even the most trivial FONode.getNameId() can already result in a slightly different Property instance
        being created.
        2) Of specific interest are percentages and other relative values.
        Their specified value -say 70%- may be the same, but the underlying LengthBase has an FObj
        instance member that comes into play when resolving that percentage during layout. A Property
        instance for a percentage value created for one FObj can currently not be re-used for another FObj.
        Percentages need to be excluded, I think, and should always trigger the creation of new instances,
        unless there's also a design-change making it possible to reset the FObj member of a single
        LengthBase.

        3) Also of interest: ToBeImplementedProperty. We definitely should check whether we really need
        different instances, or if one instance would be enough. Nothing is done with them anyway. At the very
        least: one instance per distinct initial value should suffice.

        4) A LengthProperty instance for a property with a specified absolute value of "10pt" could be re-used
        for all <length> properties (font-size, line-height, ...). This is what your patch currently demonstrates
        for the Enums. Again, a very nice proof-of-concept!

        Just one tiny question/remark: would it make sense IYO to move the propertyCache to a central
        location, say a PropertyPool, sort of a central map structured by the property types, instead of keeping
        these caches local to the types themselves? Provide two access points -get() and put()- and keep all
        the other related logic encapsulated in there...

        Show
        Andreas L. Delmelle added a comment - Richard, Nice job, so far! Been pondering a bit more about this myself, and thought up a few things to keep in mind in the process: 1) In some cases a Maker uses instance methods of the FONode for which the property is being created. Even the most trivial FONode.getNameId() can already result in a slightly different Property instance being created. 2) Of specific interest are percentages and other relative values. Their specified value - say 70% - may be the same, but the underlying LengthBase has an FObj instance member that comes into play when resolving that percentage during layout. A Property instance for a percentage value created for one FObj can currently not be re-used for another FObj. Percentages need to be excluded, I think, and should always trigger the creation of new instances, unless there's also a design-change making it possible to reset the FObj member of a single LengthBase. 3) Also of interest: ToBeImplementedProperty. We definitely should check whether we really need different instances, or if one instance would be enough. Nothing is done with them anyway. At the very least: one instance per distinct initial value should suffice. 4) A LengthProperty instance for a property with a specified absolute value of "10pt" could be re-used for all <length> properties (font-size, line-height, ...). This is what your patch currently demonstrates for the Enums. Again, a very nice proof-of-concept! Just one tiny question/remark: would it make sense IYO to move the propertyCache to a central location, say a PropertyPool, sort of a central map structured by the property types, instead of keeping these caches local to the types themselves? Provide two access points - get() and put() - and keep all the other related logic encapsulated in there...
        Hide
        Richard Wheeldon added a comment -

        Attachment rsw_delprops1.patch has been added with description: Comment out unused properties.

        Show
        Richard Wheeldon added a comment - Attachment rsw_delprops1.patch has been added with description: Comment out unused properties.
        Hide
        Richard Wheeldon added a comment -

        Comments out only those properties whose getters/setters do not form part of
        the public API. For example Block retains commonAural despite this never being
        used, because it forms part of the API via Block.getCommonAural().

        Show
        Richard Wheeldon added a comment - Comments out only those properties whose getters/setters do not form part of the public API. For example Block retains commonAural despite this never being used, because it forms part of the API via Block.getCommonAural().
        Hide
        Richard Wheeldon added a comment -

        Attachment rsw_enumprop.patch has been added with description: Replace EnumProperty and EnumNumber constructors with getInstance()

        Show
        Richard Wheeldon added a comment - Attachment rsw_enumprop.patch has been added with description: Replace EnumProperty and EnumNumber constructors with getInstance()
        Hide
        Richard Wheeldon added a comment -

        On the fo test example I'm using here, this reduces the instance count for
        EnumNumber from ~100,000 to 3 and the instance count for EnumProperty from
        ~50000 to ~200.

        Show
        Richard Wheeldon added a comment - On the fo test example I'm using here, this reduces the instance count for EnumNumber from ~100,000 to 3 and the instance count for EnumProperty from ~50000 to ~200.

          People

          • Assignee:
            fop-dev
            Reporter:
            Richard Wheeldon
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development