XalanJ2
  1. XalanJ2
  2. XALANJ-1844

XMLReaderManager provides neither no way to clear its cache nor it clears its cache by itself

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Resolution: Unresolved
    • Affects Version/s: 2.7.1
    • Fix Version/s: 2.7
    • Component/s: Other
    • Security Level: No security risk; visible to anyone (Ordinary problems in Xalan projects. Anybody can view the issue.)
    • Labels:
      None
    • Environment:
      Operating System: All
      Platform: All

      Description

      Considering a hightly multithreaded system, where threads have a short living
      time, and some of the threads perform XSLT transformation, but only once, the
      JVM soons reaches memory saturation, since XMLReaders are never re-used.

      I would suggest maybe to keep the ThreadLocal mecanism instead of the map +
      ThreadLocal duet which appears redondant to me, but maybe i am missing some point ?

      André

        Issue Links

          Activity

          Andre Doherty created issue -
          Hide
          Henry Zongaro added a comment -

          Thank you for catching that problem! I hadn't thought about the fact that the
          Hashtable would continue to hold a reference to an XMLReader even after the
          only thread that will use that XMLReader has ceased to exist.

          There are circumstances in which a thread will have more than one
          transformation and more than one XMLReader active at one time. The Hashtable
          was intended to prevent a thread from attempting to use the same XMLReader in
          two different places at one time. I will instead use a second ThreadLocal
          variable to keep track of whether an XMLReader is in use.

          Show
          Henry Zongaro added a comment - Thank you for catching that problem! I hadn't thought about the fact that the Hashtable would continue to hold a reference to an XMLReader even after the only thread that will use that XMLReader has ceased to exist. There are circumstances in which a thread will have more than one transformation and more than one XMLReader active at one time. The Hashtable was intended to prevent a thread from attempting to use the same XMLReader in two different places at one time. I will instead use a second ThreadLocal variable to keep track of whether an XMLReader is in use.
          Hide
          Andre Doherty added a comment -

          Created an attachment (id=11168)
          XMLReaderManager proposal

          Show
          Andre Doherty added a comment - Created an attachment (id=11168) XMLReaderManager proposal
          Hide
          Andre Doherty added a comment -

          Hello Henry,
          Thank you for reacting so quickly.

          In order to explain my point of view, i am enclosing a few lines of code.
          Considering the fact that the XMLReader is not thread safe, why not then not
          share an instance at all between threads, hence the manager could uses a static
          threadLocal var to keep a map which is unique for each thread, storing the
          readers used by this thread and marking their use.

          This code is probably not perfect but could be used as a proof of concept

          André

          (file is following/preceding this post)

          Show
          Andre Doherty added a comment - Hello Henry, Thank you for reacting so quickly. In order to explain my point of view, i am enclosing a few lines of code. Considering the fact that the XMLReader is not thread safe, why not then not share an instance at all between threads, hence the manager could uses a static threadLocal var to keep a map which is unique for each thread, storing the readers used by this thread and marking their use. This code is probably not perfect but could be used as a proof of concept André (file is following/preceding this post)
          Hide
          F. Andy Seidl added a comment -

          As implemented, the XMLReaderManager only improves performance of
          threads that use multiple XMLReaders. This is because the XMLReader
          instances are not shared across multiple threads. Further, if such a
          thread creates more than one reader at a time, only the most recently
          created reader available for possible reuse.

          I may be missing something, but it seems to make more sense to share
          XMLReader instances across threads, much the same way a database
          connection pool shares database connections across threads. Maybe. It
          might make more sense to not cache readers at all.

          How much of a performance bottleneck is created by not reusing XMLReader
          instances? How expensive is reader creation compared to the cost of
          managing the reader cache?

          One problem I've noticed is that XMLReader readers often maintain
          references to objects representing the parsed document. When such a reader
          is returned to the cache of unused readers, it carries with it the
          previously parsed document. If this document is large, the memory
          cost of caching the reader is large. In an application such as a server
          with hundreds of processor threads performing transformations, it can
          result in significant memory costs to maintain the reader cache.

          It may make more sense to create an optional mechanism for reader
          pooling that can be used in circumstances where it is really needed; and
          not used where it would create undesirable memory costs or where it
          would offer little performance improvement.

          Show
          F. Andy Seidl added a comment - As implemented, the XMLReaderManager only improves performance of threads that use multiple XMLReaders. This is because the XMLReader instances are not shared across multiple threads. Further, if such a thread creates more than one reader at a time, only the most recently created reader available for possible reuse. I may be missing something, but it seems to make more sense to share XMLReader instances across threads, much the same way a database connection pool shares database connections across threads. Maybe. It might make more sense to not cache readers at all. How much of a performance bottleneck is created by not reusing XMLReader instances? How expensive is reader creation compared to the cost of managing the reader cache? One problem I've noticed is that XMLReader readers often maintain references to objects representing the parsed document. When such a reader is returned to the cache of unused readers, it carries with it the previously parsed document. If this document is large, the memory cost of caching the reader is large. In an application such as a server with hundreds of processor threads performing transformations, it can result in significant memory costs to maintain the reader cache. It may make more sense to create an optional mechanism for reader pooling that can be used in circumstances where it is really needed; and not used where it would create undesirable memory costs or where it would offer little performance improvement.
          Hide
          Henry Zongaro added a comment -

          The performance benefit of reusing XMLReader instances is significant.
          However, I tend to agree that the use of a cache should be optional for the
          reasons stated.

          Show
          Henry Zongaro added a comment - The performance benefit of reusing XMLReader instances is significant. However, I tend to agree that the use of a cache should be optional for the reasons stated.
          Hide
          F. Andy Seidl added a comment -

          Created an attachment (id=11360)
          XMLReaderManager using a thread-local stack of available readers and no "in-use" hashtable.

          Show
          F. Andy Seidl added a comment - Created an attachment (id=11360) XMLReaderManager using a thread-local stack of available readers and no "in-use" hashtable.
          Hide
          F. Andy Seidl added a comment -

          The previous attachement (id=11360) is is a version of XMLReaderManager that
          uses a thread-local stack to track available readers. Since the stack
          contains only available readers, they are, by definition, not "in use". Thus,
          the "in use" map can be eliminated, which fixes the memory leak issue created
          by that map.

          It still may be better to use a cross-thread pool of available readers with a
          clean-up thread that periodically discards cached readers that have been idle
          too long. This implementation still has the potential for a long-lived thread
          to build up many cached readers that are not longer needed without a way to
          clean them up, but at least they will not persist once owner thread dies.

          Show
          F. Andy Seidl added a comment - The previous attachement (id=11360) is is a version of XMLReaderManager that uses a thread-local stack to track available readers. Since the stack contains only available readers, they are, by definition, not "in use". Thus, the "in use" map can be eliminated, which fixes the memory leak issue created by that map. It still may be better to use a cross-thread pool of available readers with a clean-up thread that periodically discards cached readers that have been idle too long. This implementation still has the potential for a long-lived thread to build up many cached readers that are not longer needed without a way to clean them up, but at least they will not persist once owner thread dies.
          Hide
          Morris Kwan added a comment -

          Created an attachment (id=12419)
          Proposed patch

          Show
          Morris Kwan added a comment - Created an attachment (id=12419) Proposed patch
          Hide
          Morris Kwan added a comment -

          I have come up with a simple patch, which just remove the XMLReaders from the
          m_isUse hashtable upon the call of releaseXMLReader. This will prevent the
          single static instance of XMLReaderManager from keeping references to died
          XMLReaders.

          I think Andy's solution is also OK, although I slightly prefer the new patch,
          which makes smaller changes to the application logic and is easier to verify.

          Show
          Morris Kwan added a comment - I have come up with a simple patch, which just remove the XMLReaders from the m_isUse hashtable upon the call of releaseXMLReader. This will prevent the single static instance of XMLReaderManager from keeping references to died XMLReaders. I think Andy's solution is also OK, although I slightly prefer the new patch, which makes smaller changes to the application logic and is easier to verify.
          Hide
          Morris Kwan added a comment -

          The patch was applied.

          Show
          Morris Kwan added a comment - The patch was applied.
          Serge Knystautas made changes -
          Field Original Value New Value
          issue.field.bugzillaimportkey 28082 26753
          Hide
          gian maria romanato added a comment -

          Hi all, and thanks for your excellent products.

          This bug represented a real headache for my company.
          We have a number of installation (J2EE applications) that suffered from OutOfMemory exceptions after a few-days-execution, or during a stress-test session.

          We have been looking for a problem in our code for weeks (the application implements some caching), and only recently we came up with a nice utility called HeapRoots (http://www.alphaworks.ibm.com/tech/heaproots), which allowed us to examine one of the memory dumps generated by Websphere Application Server 5.1 after an OutOfMemory.

          Well, our application uses a lot of XSLs and XML transformations and after a 24h stress test, the heap dump contained an instance of XMLReaderManager whose size was bigger then 500 Mega Bytes.

          We finally found this bug report, we applied the patch provided in the attachment to Xalan-J 2.6.0 codebase, and we stress-tested again the Application. After more then 24 h of stress testing, the allocated memory for the whole server was less then half of the available ram, whith the garbage collector being invoked every 6-10 seconds compared to the 2 seconds invocation that occurred with the official 2.6 release of Xalan.

          Now we have this home-built version of Xalan that we can of course deliver to our clients, but we would rather prefer an official release from Apache Foundation, containing the mentioned patch.

          This bug report states that the bug, although fixed, is not yet present in any official release.
          Is there any scheduled release to be delivered in few days ? If not, is there a chance that you release a 2.6.1 maintanance version containing this bugfix ?

          Thank you very much for your great work.

          Gian Maria Romanato.
          www.finantix.com

          Show
          gian maria romanato added a comment - Hi all, and thanks for your excellent products. This bug represented a real headache for my company. We have a number of installation (J2EE applications) that suffered from OutOfMemory exceptions after a few-days-execution, or during a stress-test session. We have been looking for a problem in our code for weeks (the application implements some caching), and only recently we came up with a nice utility called HeapRoots ( http://www.alphaworks.ibm.com/tech/heaproots ), which allowed us to examine one of the memory dumps generated by Websphere Application Server 5.1 after an OutOfMemory. Well, our application uses a lot of XSLs and XML transformations and after a 24h stress test, the heap dump contained an instance of XMLReaderManager whose size was bigger then 500 Mega Bytes. We finally found this bug report, we applied the patch provided in the attachment to Xalan-J 2.6.0 codebase, and we stress-tested again the Application. After more then 24 h of stress testing, the allocated memory for the whole server was less then half of the available ram, whith the garbage collector being invoked every 6-10 seconds compared to the 2 seconds invocation that occurred with the official 2.6 release of Xalan. Now we have this home-built version of Xalan that we can of course deliver to our clients, but we would rather prefer an official release from Apache Foundation, containing the mentioned patch. This bug report states that the bug, although fixed, is not yet present in any official release. Is there any scheduled release to be delivered in few days ? If not, is there a chance that you release a 2.6.1 maintanance version containing this bugfix ? Thank you very much for your great work. Gian Maria Romanato. www.finantix.com
          Hide
          J Radecki added a comment -

          This really sucks!

          Show
          J Radecki added a comment - This really sucks!
          Show
          J Radecki added a comment - IBM FixPack http://www-1.ibm.com/support/docview.wss?rs=499&dc=D400&q1=websphere&q2=websphere&q3=jdk&uid=swg24007926&loc=en_US&cs=utf-8&lang=en Thank Daryl & Brian
          Henry Zongaro made changes -
          Link This issue is duplicated by XALANJ-2117 [ XALANJ-2117 ]
          Brian Minchau made changes -
          Fix Version/s 2.7 [ 11080 ]
          Alan Cabrera made changes -
          Link This issue is duplicated by XALANJ-2117 [ XALANJ-2117 ]
          Alan Cabrera made changes -
          Link This issue is duplicated by XALANJ-2117 [ XALANJ-2117 ]
          Mark Thomas made changes -
          Workflow jira [ 36748 ] Default workflow, editable Closed status [ 12569523 ]
          Mark Thomas made changes -
          Workflow Default workflow, editable Closed status [ 12569523 ] jira [ 12592985 ]

            People

            • Assignee:
              Henry Zongaro
              Reporter:
              Andre Doherty
            • Votes:
              4 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development