Details

    • Type: Bug Bug
    • Status: Reopened
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.2
    • Fix Version/s: 2.1.11
    • Component/s: * Cocoon Core
    • Labels:
      None
    • Environment:
      Operating System: Windows XP
      Platform: PC

      Description

      I'm currently looking into a memory leak issue at Apache Forrest. Forrest's
      site currently needs to be built with -Xmx128m because of this. I believe the
      issue is originated at Cocoon's LinkRewriterTransformer or XMLFileModule.

      A memory profiler shows lots (30MB+) of DOM DocumentImpls (150+ objects), which
      get referenced by XMLFileModule.DocumentHelper. Their URIs are linkmap-xxx.

      LinkRewriterTransformer#createTransformedLink(String) uses a InputModuleHelper,
      which seems to reference a XMLFileModule.
        ...
        newLink = (String) modHelper.getAttribute(this.objectModel,
                           ^^^^^^^^^
        ...

      The XMLFileModule keeps the visited documents in a map, which is where they
      build up.

      Just for testing, I changed XMLFileModule#getDocumentHelper(Configuration) from
        this.documents.put(src, new DocumentHelper(reload, cache, src, this));
      to
        return new DocumentHelper(reload, cache, src, this);

      Thus, a new DocumentHelper is created every time, instead of caching them. The
      result: No more memory problems, Apache Forrest's site builds again with -Xmx32.

      Ron

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Blocked Blocked
          163d 20h 38m 1 Jean-Baptiste Quenot 23/Jan/06 18:33
          Blocked Blocked Continued Continued
          2d 5h 24m 1 Jörg Heinicke 25/Jan/06 23:57
          Continued Continued Closed Closed
          710d 23h 39m 1 Ralph Goers 06/Jan/08 23:37
          Closed Closed Reopened Reopened
          136d 14h 24m 1 Jasha Joachimsthal 22/May/08 15:01
          Ralph Goers made changes -
          Assignee Ralph Goers [ ralph.goers@dslextreme.com ]
          Jasha Joachimsthal made changes -
          Resolution Fixed [ 1 ]
          Status Closed [ 6 ] Reopened [ 4 ]
          Hide
          Jasha Joachimsthal added a comment - - edited
          AbstractJXPathModule.getAttribute() now always returns an Object of type String instead of any Object which breaks implementations that do not expect a String.
          Show
          Jasha Joachimsthal added a comment - - edited AbstractJXPathModule.getAttribute() now always returns an Object of type String instead of any Object which breaks implementations that do not expect a String.
          Ralph Goers made changes -
          Fix Version/s 2.1.11 [ 12312231 ]
          Status Continued [ 10001 ] Closed [ 6 ]
          Affects version (Component) Parent values: Components: Sitemap(10152).
          Fix version (Component) Parent values: Components: Sitemap(10229).
          Resolution Fixed [ 1 ]
          Hide
          Ralph Goers added a comment -
          XPathXMLFileModule is now also available in trunk.
          Show
          Ralph Goers added a comment - XPathXMLFileModule is now also available in trunk.
          Hide
          Ralph Goers added a comment -
          I have checked in XPathXMLFileModule which can be used as a replacement for XMLFileModule. It differs from XMLFileModule by
          1. It caches a DocumentInfo instead of a DocumentHelper. DocumentInfo contains the Document, its Validity, the url, the resolver and the expression cache. The expression cache is a ReferenceMap containing soft references.
          2. DocumentInfo objects are cached in an externally configurable cache. The default implementation uses ehcache.
          3. The soruce url can contain variables which will be resolved each time the input module is used. The cache key used is the url after resolving variables.
          4. Both the cacheable and reloadable parameters can be specified as variables which will be resolved each time the module is called.
          5. The amount of code run in a synchronized block is much smaller.

          XPathXMLFileModule performs about the same as XMLFileModule under light load. Under heavy load it experiences much more consistent behavior than XMLFileModule. In testing XMLFileModule showed a very large standard deviation in response time with a very large maximum response compared to the minimum. XPathXMLFileModule has a much smaller standard deviation and a smaller maximum response time.
          Show
          Ralph Goers added a comment - I have checked in XPathXMLFileModule which can be used as a replacement for XMLFileModule. It differs from XMLFileModule by 1. It caches a DocumentInfo instead of a DocumentHelper. DocumentInfo contains the Document, its Validity, the url, the resolver and the expression cache. The expression cache is a ReferenceMap containing soft references. 2. DocumentInfo objects are cached in an externally configurable cache. The default implementation uses ehcache. 3. The soruce url can contain variables which will be resolved each time the input module is used. The cache key used is the url after resolving variables. 4. Both the cacheable and reloadable parameters can be specified as variables which will be resolved each time the module is called. 5. The amount of code run in a synchronized block is much smaller. XPathXMLFileModule performs about the same as XMLFileModule under light load. Under heavy load it experiences much more consistent behavior than XMLFileModule. In testing XMLFileModule showed a very large standard deviation in response time with a very large maximum response compared to the minimum. XPathXMLFileModule has a much smaller standard deviation and a smaller maximum response time.
          Hide
          Ralph Goers added a comment -
          Possibly, but the caching is only one of the problems that needs to be addressed. Other issues include that document retrieval is done via a synchronized method (I'm still unclear as to why) and that the configured file needs to resolved using the PreparedVariableResolver. Basically, I plan on dumping the whole DocumentHelper object as it is tied to a fixed document, which is not at all useful for what I need. Furthermore, the code currently caches the DocumentHelper object. I plan on putting the document itself in the store, which seems much more appropriate.
          Show
          Ralph Goers added a comment - Possibly, but the caching is only one of the problems that needs to be addressed. Other issues include that document retrieval is done via a synchronized method (I'm still unclear as to why) and that the configured file needs to resolved using the PreparedVariableResolver. Basically, I plan on dumping the whole DocumentHelper object as it is tied to a fixed document, which is not at all useful for what I need. Furthermore, the code currently caches the DocumentHelper object. I plan on putting the document itself in the store, which seems much more appropriate.
          Hide
          Antonio Gallardo added a comment -
          I agree using the store is the best way. Sometimes less is more. Would you test my suggested implementation if I wrote it?
          Show
          Antonio Gallardo added a comment - I agree using the store is the best way. Sometimes less is more. Would you test my suggested implementation if I wrote it?
          Hide
          Ralph Goers added a comment -
          Actually, I don't care much what is done to XMLFileModule. I've basically decided that there are so many issues that I need fixed that I can't modify it in a backwards compatible way, so I'm writing a whole new module. My plan is to use the store instead of a map (although it might be a new instance of the store - not sure yet).
          Show
          Ralph Goers added a comment - Actually, I don't care much what is done to XMLFileModule. I've basically decided that there are so many issues that I need fixed that I can't modify it in a backwards compatible way, so I'm writing a whole new module. My plan is to use the store instead of a map (although it might be a new instance of the store - not sure yet).
          Hide
          Antonio Gallardo added a comment -
          For a first implementation a to get rid of the memory leak, we should implement a RefenceMap for documents.

          WDOT?
          Show
          Antonio Gallardo added a comment - For a first implementation a to get rid of the memory leak, we should implement a RefenceMap for documents. WDOT?
          Hide
          Ellis Pritchard added a comment -
          That's great; I'd noticed the ever-growing-cache problem, but not the synchronization problem, urgh!

          On the users mailing list, dynamic configuration was mentioned, i.e. adding the URI of the XML file to the string used by the module (name parameter to getAttribute()), Andrew Stevens suggested using XPointer notation:

          e.g. you could do <map:parameter name="foo" value="{myxmlfile:{1}.xml#xpointer(/document/bar/foo)}"/>

          Is this something you could include in the new version?
          Show
          Ellis Pritchard added a comment - That's great; I'd noticed the ever-growing-cache problem, but not the synchronization problem, urgh! On the users mailing list, dynamic configuration was mentioned, i.e. adding the URI of the XML file to the string used by the module (name parameter to getAttribute()), Andrew Stevens suggested using XPointer notation: e.g. you could do <map:parameter name="foo" value="{myxmlfile:{1}.xml#xpointer(/document/bar/foo)}"/> Is this something you could include in the new version?
          Hide
          Ralph Goers added a comment -
          Not yet. I hope to get to this in the next couple of days.
          Show
          Ralph Goers added a comment - Not yet. I hope to get to this in the next couple of days.
          Hide
          Gavin added a comment -
          Hi Guys, Any progress on this issue at all.
          Forrest is looking to a new release soon, and we have an issue that depends on this one currently scheduled to be fixed before the release.

          Thanks

          Gav...
          Show
          Gavin added a comment - Hi Guys, Any progress on this issue at all. Forrest is looking to a new release soon, and we have an issue that depends on this one currently scheduled to be fixed before the release. Thanks Gav...
          Jörg Heinicke made changes -
          Status On Hold [ 10000 ] Continued [ 10001 ]
          Ralph Goers made changes -
          Assignee Jean-Baptiste Quenot [ jbq ] Ralph Goers [ ralph.goers@dslextreme.com ]
          Bugzilla Id 36162
          Hide
          Ralph Goers added a comment -
          I'm assigning this to me as I need to make changes to it for my users.

          We are currently running with caching disabled because it returns the wrong content if we don't - the way caching is currently implemented you can't have an input module in the input source so you can only use fixed names. In my case they reference a pipeline that generates dynamic content - which doesn't work well since the cache key never changes. And with caching disabled it is a horrible bottleneck since it synchronizes while it is reading from disk - which happens on every request. Basically, XMLFileModule just doesn't work the way it should.
          Show
          Ralph Goers added a comment - I'm assigning this to me as I need to make changes to it for my users. We are currently running with caching disabled because it returns the wrong content if we don't - the way caching is currently implemented you can't have an input module in the input source so you can only use fixed names. In my case they reference a pipeline that generates dynamic content - which doesn't work well since the cache key never changes. And with caching disabled it is a horrible bottleneck since it synchronizes while it is reading from disk - which happens on every request. Basically, XMLFileModule just doesn't work the way it should.
          Hide
          Jörg Heinicke added a comment -
          This is a bad suggestion for a memory leak, isn't it?
          Show
          Jörg Heinicke added a comment - This is a bad suggestion for a memory leak, isn't it?
          Jean-Baptiste Quenot made changes -
          Assignee Cocoon Developers Team [ cocoon ] Jean-Baptiste Quenot [ jbq ]
          Jean-Baptiste Quenot made changes -
          Status Open [ 1 ] On Hold [ 10000 ]
          Hide
          Jean-Baptiste Quenot added a comment -
          Why don't you just disable caching on your instance of XMLFileModule?
          Show
          Jean-Baptiste Quenot added a comment - Why don't you just disable caching on your instance of XMLFileModule?
          Ross Gardler made changes -
          Link This issue blocks FOR-591 [ FOR-591 ]
          Ross Gardler made changes -
          Link This issue is blocked by FOR-591 [ FOR-591 ]
          Ross Gardler made changes -
          Link This issue is blocked by FOR-591 [ FOR-591 ]
          Ralph Goers made changes -
          Component/s * Cocoon Core [ 12310442 ]
          Component/s Blocks: (Undefined) [ 12310440 ]
          Bugzilla Id 36162
          Description I'm currently looking into a memory leak issue at Apache Forrest. Forrest's
          site currently needs to be built with -Xmx128m because of this. I believe the
          issue is originated at Cocoon's LinkRewriterTransformer or XMLFileModule.

          A memory profiler shows lots (30MB+) of DOM DocumentImpls (150+ objects), which
          get referenced by XMLFileModule.DocumentHelper. Their URIs are linkmap-xxx.

          LinkRewriterTransformer#createTransformedLink(String) uses a InputModuleHelper,
          which seems to reference a XMLFileModule.
            ...
            newLink = (String) modHelper.getAttribute(this.objectModel,
                               ^^^^^^^^^
            ...

          The XMLFileModule keeps the visited documents in a map, which is where they
          build up.

          Just for testing, I changed XMLFileModule#getDocumentHelper(Configuration) from
            this.documents.put(src, new DocumentHelper(reload, cache, src, this));
          to
            return new DocumentHelper(reload, cache, src, this);

          Thus, a new DocumentHelper is created every time, instead of caching them. The
          result: No more memory problems, Apache Forrest's site builds again with -Xmx32.

          Ron
          I'm currently looking into a memory leak issue at Apache Forrest. Forrest's
          site currently needs to be built with -Xmx128m because of this. I believe the
          issue is originated at Cocoon's LinkRewriterTransformer or XMLFileModule.

          A memory profiler shows lots (30MB+) of DOM DocumentImpls (150+ objects), which
          get referenced by XMLFileModule.DocumentHelper. Their URIs are linkmap-xxx.

          LinkRewriterTransformer#createTransformedLink(String) uses a InputModuleHelper,
          which seems to reference a XMLFileModule.
            ...
            newLink = (String) modHelper.getAttribute(this.objectModel,
                               ^^^^^^^^^
            ...

          The XMLFileModule keeps the visited documents in a map, which is where they
          build up.

          Just for testing, I changed XMLFileModule#getDocumentHelper(Configuration) from
            this.documents.put(src, new DocumentHelper(reload, cache, src, this));
          to
            return new DocumentHelper(reload, cache, src, this);

          Thus, a new DocumentHelper is created every time, instead of caching them. The
          result: No more memory problems, Apache Forrest's site builds again with -Xmx32.

          Ron
          Pier Fumagalli made changes -
          Workflow jira [ 12341318 ] Cocoon Workflow [ 12341451 ]
          Jeff Turner made changes -
          Field Original Value New Value
          issue.field.bugzillaimportkey 36162 12324883
          Hide
          Jörg Heinicke added a comment -
          Just to sum up:

          XMLFileModule holds its own caches for storing already processed documents ad
          expressions. Unfortunately the cache for the documents (a standard HashMap)
          grows and grows while the caches for the expressions use weak references.
          So the XMLFileModule should either use the official store for all its caches or
          at least not use a cache object that grows endlessly.
          Show
          Jörg Heinicke added a comment - Just to sum up: XMLFileModule holds its own caches for storing already processed documents ad expressions. Unfortunately the cache for the documents (a standard HashMap) grows and grows while the caches for the expressions use weak references. So the XMLFileModule should either use the official store for all its caches or at least not use a cache object that grows endlessly.
          Hide
          Ronald Blaschke added a comment -
          I am not very familiar with Cocoons internals, the changes I made are just a
          quick workaround, and probably not very cocoonish.
          Maybe XMLFileModule should not keep references to the DocumentHelpers, maybe
          LinkRewriterTransformer should use a new InputModuleHelper for each document, or
          maybe something completely different. ;-)
          Show
          Ronald Blaschke added a comment - I am not very familiar with Cocoons internals, the changes I made are just a quick workaround, and probably not very cocoonish. Maybe XMLFileModule should not keep references to the DocumentHelpers, maybe LinkRewriterTransformer should use a new InputModuleHelper for each document, or maybe something completely different. ;-)
          Hide
          Jörg Heinicke added a comment -
          There might be a reason for this "cache", so we might discuss it on the list
          first. Maybe it must be changed to use the store.
          Show
          Jörg Heinicke added a comment - There might be a reason for this "cache", so we might discuss it on the list first. Maybe it must be changed to use the store.
          Hide
          Antonio Gallardo added a comment -
          Can you send a patch? I will gladly add to the cocoon trunk. :-)
          Show
          Antonio Gallardo added a comment - Can you send a patch? I will gladly add to the cocoon trunk. :-)
          Ronald Blaschke created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              Ronald Blaschke
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development