Xerces-C++
  1. Xerces-C++
  2. XERCESC-1508

createDocumentType does not release memory when document released/deleted

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.6.0, 2.7.0
    • Fix Version/s: None
    • Component/s: DOM
    • Labels:
      None
    • Environment:
      SUSE Linux 9, Red Hat Advanced Server 3.6, Ubuntu 7.04

      Description

      Hi

      I am writing an aplication that has to create XML documents on the fly for interprocess communications.

      The problem that I have is that the memory is not being released when I use the DOMDocument (documentType) constructor. I have played around with various different ways of releasing the memory including using the DOMDocumentType->release() method but alas no luck. I appreciate that this may not seem like a huge issue but as I am looking memory everytime I create a new document I feel that this is an issue to be resolved.

      I have also found a few excepts on the web talking about this issue and there also seems to be a few people who believe this issue to be a case of bad API use:
      http://mail-archives.apache.org/mod_mbox/xerces-c-dev/200505.mbox/%3c87vf5bi9gy.fsf@openinformatics.com%3e
      .. this is followed up by a gentleman named Gareth Reakes from PathenonComputing describing that this is to do with the way that the release methods have not been called .. I do not believe him:
      "Nope, not a real leak. When you free the document the memory pools that were used to new things from are deleted so it cleans up all the things like the attribute list from the creation of an element."

      The below example shows how this is not the case.

      I have created the following example code thaqt fully replicates the issue including taking stats of the memory usage (RSS) after each document creation and deletion. Please note that in main() I have a commented out version of the call to create the document as this shows how the standard constructor for the domdocument creates without any memory leaks

      ///////////////////////////////////////////
      #include <xercesc/util/PlatformUtils.hpp>
      #include <xercesc/parsers/XercesDOMParser.hpp> // to get the xmlstring
      #include <xercesc/dom/DOM.hpp>

      #include <iostream>
      #include <fcntl.h>
      #include <unistd.h>

      /////////////////////////////////////////////////////////////////////
      // pre decl
      XERCES_CPP_NAMESPACE::DOMDocument *createNewDoc();
      XERCES_CPP_NAMESPACE::DOMDocument *createNewDocWithType();
      int getMemorySize();

      /////////////////////////////////////////////////////////////////////
      int main()
      {
      std::cout << "**************** starting" << std::endl;
      XERCES_CPP_NAMESPACE::XMLPlatformUtils::Initialize();

      for( int i = 0; i < 1000; ++i )

      { //XERCES_CPP_NAMESPACE::DOMDocument * doc = createNewDoc(); XERCES_CPP_NAMESPACE::DOMDocument * doc = createNewDocWithType(); std::cout << " - Doing something with the document" << std::endl; delete doc; usleep(100000); std::cout << "* Memory (rss): " << getMemorySize() << std::endl; }

      XERCES_CPP_NAMESPACE::XMLPlatformUtils::Terminate();
      std::cout << "**************** ending" << std::endl;
      return 0;
      }

      /////////////////////////////////////////////////////////////////////
      XERCES_CPP_NAMESPACE::DOMDocument *createNewDocWithType()

      { // set up strings XMLCh *tName = XERCES_CPP_NAMESPACE::XMLString::transcode( "typeName" ); XMLCh *pub = XERCES_CPP_NAMESPACE::XMLString::transcode( "pubId" ); XMLCh *sys = XERCES_CPP_NAMESPACE::XMLString::transcode( "sysId" ); XMLCh *name = XERCES_CPP_NAMESPACE::XMLString::transcode( "rootName" ); XERCES_CPP_NAMESPACE::DOMImplementation *imp = XERCES_CPP_NAMESPACE::DOMImplementation::getImplementation(); // create the document type and the doc XERCES_CPP_NAMESPACE::DOMDocumentType *type = imp->createDocumentType( tName, pub, sys ); XERCES_CPP_NAMESPACE::DOMDocument *tmp = imp->createDocument( NULL, name, type ); std::cout << " - created a doc" << std::endl; // clean up XERCES_CPP_NAMESPACE::XMLString::release(&tName); XERCES_CPP_NAMESPACE::XMLString::release(&name); XERCES_CPP_NAMESPACE::XMLString::release(&pub); XERCES_CPP_NAMESPACE::XMLString::release(&sys); std::cout << " - returning doc" << std::endl; return tmp;; }

      /////////////////////////////////////////////////////////////////////
      XERCES_CPP_NAMESPACE::DOMDocument *createNewDoc()

      { // set up strings XERCES_CPP_NAMESPACE::DOMImplementation *imp = XERCES_CPP_NAMESPACE::DOMImplementation::getImplementation(); // create the document type and the doc XERCES_CPP_NAMESPACE::DOMDocument *tmp = imp->createDocument(); std::cout << " - returning doc" << std::endl; return tmp;; }

      /////////////////////////////////////////////////////////////////////
      int getMemorySize()
      {
      char fileName[64];
      int fd;

      snprintf( fileName, sizeof(fileName), "/proc/%d/statm", (int)getpid() );
      fd = open(fileName, O_RDONLY);
      if( fd == -1 )

      { return -1; }

      char memInfo[128];
      int rval = read( fd, memInfo, sizeof(memInfo)-1 );
      close(fd);
      if( rval <= 0 )
      { return -1; }

      memInfo[rval] = '\0';
      int rss;
      rval = sscanf( memInfo, "%*d %d", &rss );
      if( rval != 1 )

      { return -1; }

      return rss * getpagesize() / 1024;

      }

      /////////////////////////////////////////////////////////////////////

      1. play.cpp
        3 kB
        peter suggitt
      2. DOMDocumentType_leak.cc
        0.6 kB
        Hal Black

        Activity

        Hide
        peter suggitt added a comment -

        Valgrind output:

        ==28513== TRANSLATE: 0x4073E610 redirected to 0x40617F1E
        ==28513==
        ==28513== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
        ==28513== malloc/free: in use at exit: 680 bytes in 10 blocks.
        ==28513== malloc/free: 302 allocs, 292 frees, 731898 bytes allocated.
        ==28513==
        ==28513== searching for pointers to 10 not-freed blocks.
        ==28513== checked 8877280 bytes.
        ==28513==
        ==28513== 680 bytes in 10 blocks are still reachable in loss record 1 of 1
        ==28513== at 0x4002CC6B: __builtin_new (in /usr/lib/valgrind/vgskin_memcheck.so)
        ==28513== by 0x4002CCC2: operator new(unsigned) (in /usr/lib/valgrind/vgskin_memcheck.so)
        ==28513== by 0x403AB3C3: xercesc_2_6::DOMImplementationImpl::createDocumentType(unsigned short const*, unsigned short const*, unsigned short const*) (DOMImplementationImpl.cpp:205)
        ==28513== by 0x804A4EA: createNewDocWithType() (play.cpp:47)
        ==28513==
        ==28513== LEAK SUMMARY:
        ==28513== definitely lost: 0 bytes in 0 blocks.
        ==28513== possibly lost: 0 bytes in 0 blocks.
        ==28513== still reachable: 680 bytes in 10 blocks.
        ==28513== suppressed: 0 bytes in 0 blocks.
        ==28513==
        -28513- TT/TC: 0 tc sectors discarded.
        -28513- 2680 chainings, 0 unchainings.
        -28513- translate: new 4903 (83024 -> 1091150; ratio 131:10)
        -28513- discard 0 (0 -> 0; ratio 0:10).
        -28513- dispatch: 100000 jumps (bb entries), of which 99608 (99%) were unchained.
        -28513- 110/17695 major/minor sched events. 7761 tt_fast misses.
        -28513- reg-alloc: 818 t-req-spill, 203724+4429 orig+spill uis, 24712 total-reg-r.
        -28513- sanity: 15 cheap, 1 expensive checks.
        -28513- ccalls: 25847 C calls, 56% saves+restores avoided (85552 bytes)
        -28513- 33847 args, avg 0.88 setup instrs each (8098 bytes)
        -28513- 0% clear the stack (77541 bytes)
        -28513- 8452 retvals, 35% of reg-reg movs avoided (5760 bytes)

        Show
        peter suggitt added a comment - Valgrind output: ==28513== TRANSLATE: 0x4073E610 redirected to 0x40617F1E ==28513== ==28513== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ==28513== malloc/free: in use at exit: 680 bytes in 10 blocks. ==28513== malloc/free: 302 allocs, 292 frees, 731898 bytes allocated. ==28513== ==28513== searching for pointers to 10 not-freed blocks. ==28513== checked 8877280 bytes. ==28513== ==28513== 680 bytes in 10 blocks are still reachable in loss record 1 of 1 ==28513== at 0x4002CC6B: __builtin_new (in /usr/lib/valgrind/vgskin_memcheck.so) ==28513== by 0x4002CCC2: operator new(unsigned) (in /usr/lib/valgrind/vgskin_memcheck.so) ==28513== by 0x403AB3C3: xercesc_2_6::DOMImplementationImpl::createDocumentType(unsigned short const*, unsigned short const*, unsigned short const*) (DOMImplementationImpl.cpp:205) ==28513== by 0x804A4EA: createNewDocWithType() (play.cpp:47) ==28513== ==28513== LEAK SUMMARY: ==28513== definitely lost: 0 bytes in 0 blocks. ==28513== possibly lost: 0 bytes in 0 blocks. ==28513== still reachable: 680 bytes in 10 blocks. ==28513== suppressed: 0 bytes in 0 blocks. ==28513== - 28513 - TT/TC: 0 tc sectors discarded. - 28513 - 2680 chainings, 0 unchainings. - 28513 - translate: new 4903 (83024 -> 1091150; ratio 131:10) - 28513 - discard 0 (0 -> 0; ratio 0:10). - 28513 - dispatch: 100000 jumps (bb entries), of which 99608 (99%) were unchained. - 28513 - 110/17695 major/minor sched events. 7761 tt_fast misses. - 28513 - reg-alloc: 818 t-req-spill, 203724+4429 orig+spill uis, 24712 total-reg-r. - 28513 - sanity: 15 cheap, 1 expensive checks. - 28513 - ccalls: 25847 C calls, 56% saves+restores avoided (85552 bytes) - 28513 - 33847 args, avg 0.88 setup instrs each (8098 bytes) - 28513 - 0% clear the stack (77541 bytes) - 28513 - 8452 retvals, 35% of reg-reg movs avoided (5760 bytes)
        Hide
        Gareth Reakes added a comment -

        Hi, I cannot see anything wrong with your code. I have had a quick look at the implementation and that looks fine at first look. I will take a more in depth look tomorrow. Can you confirm that this happens with 2.7? Also, have you tried creating the document type via the factories in the document rather than the implementation? Does that make any difference?

        Cheers,

        Gareth

        Show
        Gareth Reakes added a comment - Hi, I cannot see anything wrong with your code. I have had a quick look at the implementation and that looks fine at first look. I will take a more in depth look tomorrow. Can you confirm that this happens with 2.7? Also, have you tried creating the document type via the factories in the document rather than the implementation? Does that make any difference? Cheers, Gareth
        Hide
        peter suggitt added a comment -

        Hi,

        I have not tried with the 2.7 implementation. Was there a specific fix for this issue in that version?

        I have not tried using the factories from the document class. I have had look around the header files for another way around this but I have not found one that will allow me to set the sys and pub ids as I am trying to do above. The non standard extensions in the DOMDocument header file only show a reference back to the
        virtual DOMDocumentType *createDocumentType(const XMLCh *name) = 0;
        method. If there is a more correct way to approach the document type construction through the domument API then I am willing to give it a go and feed back any results (please advise).

        This still does not resolve the underlying issue of a memory leak in there. Realistically there should be a way to release this object once we have finished using the document implicitly through the DOMDocument destructor/realease mecahnsim. Destroying the type when the document is in existence breaks your model of memory management.

        Thank you for the swift response in this matter.

        Sincerely
        Peter Suggitt

        Show
        peter suggitt added a comment - Hi, I have not tried with the 2.7 implementation. Was there a specific fix for this issue in that version? I have not tried using the factories from the document class. I have had look around the header files for another way around this but I have not found one that will allow me to set the sys and pub ids as I am trying to do above. The non standard extensions in the DOMDocument header file only show a reference back to the virtual DOMDocumentType *createDocumentType(const XMLCh *name) = 0; method. If there is a more correct way to approach the document type construction through the domument API then I am willing to give it a go and feed back any results (please advise). This still does not resolve the underlying issue of a memory leak in there. Realistically there should be a way to release this object once we have finished using the document implicitly through the DOMDocument destructor/realease mecahnsim. Destroying the type when the document is in existence breaks your model of memory management. Thank you for the swift response in this matter. Sincerely Peter Suggitt
        Hide
        Gareth Reakes added a comment -

        Hi,

        >>I have not tried with the 2.7 implementation. Was there a specific fix for this issue in that version?

        There were many fixes in 2.7, I don't know if this was fixed. The first thing that I will have to to is try with the latest version and I was hopeing you would do that

        >>The non standard extensions in the DOMDocument header file only show a reference back to the ...

        The non standard extensions are overriden in the implementation (DOMDocumentImpl). Its not that its more correct, but it would be interesting to see if the same behaviour occurs.

        >>Destroying the type when the document is in existence breaks your model of memory management.

        This should not happen. When you set the documentType on the document, it takes resposibility for deleting it when release is called. Take a look at in DOMDocumentTypeImpl::release() (this is called from the DOMDocumentImpl::release). If you have the source code then I would suggest you put some debug in

        void DOMDocumentTypeImpl::release()
        {
        if (fNode.isOwned()) {
        if (fNode.isToBeReleased()) {
        // we are calling from documnet.release() which will notify the user data handler
        if (fIsCreatedFromHeap)

        { DOMDocumentType* docType = this; delete docType; }

        }

        then you will see if the DocumentType is being deleted. If it is not then we are nearer to working out what is going on.

        Show
        Gareth Reakes added a comment - Hi, >>I have not tried with the 2.7 implementation. Was there a specific fix for this issue in that version? There were many fixes in 2.7, I don't know if this was fixed. The first thing that I will have to to is try with the latest version and I was hopeing you would do that >>The non standard extensions in the DOMDocument header file only show a reference back to the ... The non standard extensions are overriden in the implementation (DOMDocumentImpl). Its not that its more correct, but it would be interesting to see if the same behaviour occurs. >>Destroying the type when the document is in existence breaks your model of memory management. This should not happen. When you set the documentType on the document, it takes resposibility for deleting it when release is called. Take a look at in DOMDocumentTypeImpl::release() (this is called from the DOMDocumentImpl::release). If you have the source code then I would suggest you put some debug in void DOMDocumentTypeImpl::release() { if (fNode.isOwned()) { if (fNode.isToBeReleased()) { // we are calling from documnet.release() which will notify the user data handler if (fIsCreatedFromHeap) { DOMDocumentType* docType = this; delete docType; } } then you will see if the DocumentType is being deleted. If it is not then we are nearer to working out what is going on.
        Hide
        peter suggitt added a comment -

        >> There were many fixes in 2.7, I don't know if this was fixed. The first thing that I will have to to is try with the latest version and I was hopeing you would do that

        I do not have a 2.7 version available at the moment. And after taking a while to get the 2.6 version running (had to download some extra source code for compilation) I would rather not. Do you have a verion of 2.7 that this test app (see attachment) can be run against?

        I take it this is something that you will be looking into? (The above comment suggests that you may we waiting for some confirmation from me).

        Based on the code i have supplied can you suggest an alternative approach to the construction beyond the following:
        XERCES_CPP_NAMESPACE::DOMImplementation *imp = XERCES_CPP_NAMESPACE::DOMImplementation::getImplementation();
        // create the document type and the doc
        XERCES_CPP_NAMESPACE::DOMDocumentType *type = imp->createDocumentType( tName, pub, sys );
        XERCES_CPP_NAMESPACE::DOMDocument *tmp = imp->createDocument( NULL, name, type );

        I do not actually see an alternative that allows me so explicitly set the type, public id and system id. perhaps I have missed something.

        Show
        peter suggitt added a comment - >> There were many fixes in 2.7, I don't know if this was fixed. The first thing that I will have to to is try with the latest version and I was hopeing you would do that I do not have a 2.7 version available at the moment. And after taking a while to get the 2.6 version running (had to download some extra source code for compilation) I would rather not. Do you have a verion of 2.7 that this test app (see attachment) can be run against? I take it this is something that you will be looking into? (The above comment suggests that you may we waiting for some confirmation from me). Based on the code i have supplied can you suggest an alternative approach to the construction beyond the following: XERCES_CPP_NAMESPACE::DOMImplementation *imp = XERCES_CPP_NAMESPACE::DOMImplementation::getImplementation(); // create the document type and the doc XERCES_CPP_NAMESPACE::DOMDocumentType *type = imp->createDocumentType( tName, pub, sys ); XERCES_CPP_NAMESPACE::DOMDocument *tmp = imp->createDocument( NULL, name, type ); I do not actually see an alternative that allows me so explicitly set the type, public id and system id. perhaps I have missed something.
        Hide
        Gareth Reakes added a comment -

        I can get download a version of 2.7 and see what happens. Are you intending to put the debug in I suggested?

        The poorly documeted

        virtual DOMDocumentType* DOMDocument::createDocumentType ( const XMLCh * qName,
        const XMLCh * ,
        const XMLCh *
        ) [virtual]

        does what you want I think.

        Show
        Gareth Reakes added a comment - I can get download a version of 2.7 and see what happens. Are you intending to put the debug in I suggested? The poorly documeted virtual DOMDocumentType* DOMDocument::createDocumentType ( const XMLCh * qName, const XMLCh * , const XMLCh * ) [virtual] does what you want I think.
        Hide
        peter suggitt added a comment -

        Hi

        I have added in the debug output as described and still have a leak in there.

        Even with all the debug in there the docType->release() method is just not being called at all when you simply delete the doc as in the attached test harness.

        Looking at the DOMDocumentImpl::~DOMDocumentImpl() destructor it seems that there is absolutely no mention of the document type in there. However there is a DOMDocumentImpl::release() method that does free up the document type but again just calling that will result in leaks.

        This kind of suggests that there is more than one leak in there.

        Would it not be better to have a constructor for the document that simply takes in the parameters of the document type instead of the document type pointer so that the document can be fully responsible for the doc type memory allocation?

        Sincerely
        Pete Suggitt

        Show
        peter suggitt added a comment - Hi I have added in the debug output as described and still have a leak in there. Even with all the debug in there the docType->release() method is just not being called at all when you simply delete the doc as in the attached test harness. Looking at the DOMDocumentImpl::~DOMDocumentImpl() destructor it seems that there is absolutely no mention of the document type in there. However there is a DOMDocumentImpl::release() method that does free up the document type but again just calling that will result in leaks. This kind of suggests that there is more than one leak in there. Would it not be better to have a constructor for the document that simply takes in the parameters of the document type instead of the document type pointer so that the document can be fully responsible for the doc type memory allocation? Sincerely Pete Suggitt
        Hide
        peter suggitt added a comment -

        Hi Gareth

        One thing I did forget to mention was that I did have a play with the DOMDocumentType* DOMDocument::createDocumentType but I do not think this helps much with the test harness illustration due to the fact that you need a document in place to then create the document type .. eg
        1) create an empty document
        2) create a document type from that document
        3) use the document type to create a new document.

        ---> abort!

        One thing that I have not seen anywhere is a doc type accessor (setter) so that I could do:
        doc *d = imp->createDocument();
        d.setDocumentType( d->createDocumentType(name, sys, pub) );

        Obviously you want the doc type when you instantiate the doc rather than after the fact.

        .. or am I off keel here?

        Your thoughts will be greatly appreciated.

        Sincerely
        Pete Suggitt

        Show
        peter suggitt added a comment - Hi Gareth One thing I did forget to mention was that I did have a play with the DOMDocumentType* DOMDocument::createDocumentType but I do not think this helps much with the test harness illustration due to the fact that you need a document in place to then create the document type .. eg 1) create an empty document 2) create a document type from that document 3) use the document type to create a new document. ---> abort! One thing that I have not seen anywhere is a doc type accessor (setter) so that I could do: doc *d = imp->createDocument(); d.setDocumentType( d->createDocumentType(name, sys, pub) ); Obviously you want the doc type when you instantiate the doc rather than after the fact. .. or am I off keel here? Your thoughts will be greatly appreciated. Sincerely Pete Suggitt
        Hide
        Gareth Reakes added a comment -

        I have had a good look at the code now and interestingly the doctype is only deleted in the document->release method, not in the destructor. Release does however delete the document. Calling release should solve your problem for now. I will move the deletion of the DT into the destructor and document the difference in behaviour of calling each one (that is that release will also call the user data handlers and therefore be slower). Please give this a try and report back so I can see if this is fixed.

        Cheers,

        Gareth

        Show
        Gareth Reakes added a comment - I have had a good look at the code now and interestingly the doctype is only deleted in the document->release method, not in the destructor. Release does however delete the document. Calling release should solve your problem for now. I will move the deletion of the DT into the destructor and document the difference in behaviour of calling each one (that is that release will also call the user data handlers and therefore be slower). Please give this a try and report back so I can see if this is fixed. Cheers, Gareth
        Hide
        peter suggitt added a comment -

        Gareth hi

        This was what I saw when I was looking at it too. What I also found was that when the release method is called we also have a memory leak.

        My gut feel on this is that there is an identified leak in the delete doc approach and an unidentified one in the doc->release approach.

        I tried both ways last night and even changed it so that the code first calls release and then delete. In this second 'test' I saw that the call to free threw issues. Either way (the/a) leak remains.

        Have you run the above test harness in your investigations. I believe you will find it quite a useful incite into this issue.

        Sincerely
        Pete

        Show
        peter suggitt added a comment - Gareth hi This was what I saw when I was looking at it too. What I also found was that when the release method is called we also have a memory leak. My gut feel on this is that there is an identified leak in the delete doc approach and an unidentified one in the doc->release approach. I tried both ways last night and even changed it so that the code first calls release and then delete. In this second 'test' I saw that the call to free threw issues. Either way (the/a) leak remains. Have you run the above test harness in your investigations. I believe you will find it quite a useful incite into this issue. Sincerely Pete
        Hide
        Gareth Reakes added a comment -

        Hi,

        What you see there is not a memory leak. It seems more memory is being taken up because we use a global document to create things in the DT. This is then released when we call terminate. If you call terminate at the end of your test program and then the memory usage call again you will see that is drops to a constant value no matter how many times the loop in the test program is called. I agree that this is not desirable. I also agree that the model of the DT does not fit our memory model. Can you confirm that what I say fits with what you see? I will have a think about and talk to others about the best way to make this nicer.

        Cheers,

        Gareth

        Show
        Gareth Reakes added a comment - Hi, What you see there is not a memory leak. It seems more memory is being taken up because we use a global document to create things in the DT. This is then released when we call terminate. If you call terminate at the end of your test program and then the memory usage call again you will see that is drops to a constant value no matter how many times the loop in the test program is called. I agree that this is not desirable. I also agree that the model of the DT does not fit our memory model. Can you confirm that what I say fits with what you see? I will have a think about and talk to others about the best way to make this nicer. Cheers, Gareth
        Hide
        peter suggitt added a comment -

        Hi Gareth

        Terminate .. as in XERCES_CPP_NAMESPACE::XMLPlatformUtils::Terminate()?

        This would certainly not fit in with any design that requires an ongoing parsing function.

        In the implementation that I have been using I create a singleton for xerces that manages all xml functionality. I create 1 parser and 1 writer and utilise them throughout the lifespan of the application (intended to be in the scale of days/months). This will only release the memory when the singleton calls the std::atexit() method (thus cleaning the singleton object and calling terminate on the platform). This is (I believe) how this should be implemented. If I were to follow your above suggestion, I would have to create a new parser (pls associated objects) for every parse that I perform (I may as well render it to static functionality) so I do not see this as a viable work around.

        I think the solution lies in the management of the DT address space, as you have stated above.

        For the record I will change the test-harness above the test with the terminate function to prove that terminate will release the memory and come back to you. But I will state that this is not a valid workaround for the issues I am seeing.

        I look forward to the feedback from your discussions.

        Kind regards
        Pete

        Show
        peter suggitt added a comment - Hi Gareth Terminate .. as in XERCES_CPP_NAMESPACE::XMLPlatformUtils::Terminate()? This would certainly not fit in with any design that requires an ongoing parsing function. In the implementation that I have been using I create a singleton for xerces that manages all xml functionality. I create 1 parser and 1 writer and utilise them throughout the lifespan of the application (intended to be in the scale of days/months). This will only release the memory when the singleton calls the std::atexit() method (thus cleaning the singleton object and calling terminate on the platform). This is (I believe) how this should be implemented. If I were to follow your above suggestion, I would have to create a new parser (pls associated objects) for every parse that I perform (I may as well render it to static functionality) so I do not see this as a viable work around. I think the solution lies in the management of the DT address space, as you have stated above. For the record I will change the test-harness above the test with the terminate function to prove that terminate will release the memory and come back to you. But I will state that this is not a valid workaround for the issues I am seeing. I look forward to the feedback from your discussions. Kind regards Pete
        Hide
        peter suggitt added a comment -

        Gareth hi

        I have indeed tried out the suggested approach and have changed the main method of the test-harness as below:
        std::cout << "**************** starting" << std::endl;

        for( int i = 0; i < 100; ++i )

        { XERCES_CPP_NAMESPACE::XMLPlatformUtils::Initialize(); XERCES_CPP_NAMESPACE::DOMDocument * doc = createNewDocWithType(); std::cout << " - Doing something with the document" << std::endl; doc->release(); usleep(100000); std::cout << "* Memory (rss): " << getMemorySize() << std::endl; XERCES_CPP_NAMESPACE::XMLPlatformUtils::Terminate(); std::cout << "* Memory (rss): " << getMemorySize() << std::endl; }

        std::cout << "**************** ending" << std::endl;

        This turns out to be rock-solid when executed, as you have suggested

        One thing that I have noticed is that if instead of calling release on the document I simply delete the doc (replacing 'doc->release' with 'delete doc') from the heap then there is a leak ... ????? The memory size increases from 2620 -> 2644 over 100 iterations.

        This kind of suggests that there is a wider issue here with the way that memory is allocated and released (as we have both agreed above).

        I do not think there is a great deal more that I can add to this now, if there is pls do let me know and I will do my best to help.

        I look forward to your thoughts in this matter.

        Kind regards
        Pete

        Show
        peter suggitt added a comment - Gareth hi I have indeed tried out the suggested approach and have changed the main method of the test-harness as below: std::cout << "**************** starting" << std::endl; for( int i = 0; i < 100; ++i ) { XERCES_CPP_NAMESPACE::XMLPlatformUtils::Initialize(); XERCES_CPP_NAMESPACE::DOMDocument * doc = createNewDocWithType(); std::cout << " - Doing something with the document" << std::endl; doc->release(); usleep(100000); std::cout << "* Memory (rss): " << getMemorySize() << std::endl; XERCES_CPP_NAMESPACE::XMLPlatformUtils::Terminate(); std::cout << "* Memory (rss): " << getMemorySize() << std::endl; } std::cout << "**************** ending" << std::endl; This turns out to be rock-solid when executed, as you have suggested One thing that I have noticed is that if instead of calling release on the document I simply delete the doc (replacing 'doc->release' with 'delete doc') from the heap then there is a leak ... ????? The memory size increases from 2620 -> 2644 over 100 iterations. This kind of suggests that there is a wider issue here with the way that memory is allocated and released (as we have both agreed above). I do not think there is a great deal more that I can add to this now, if there is pls do let me know and I will do my best to help. I look forward to your thoughts in this matter. Kind regards Pete
        Hide
        peter suggitt added a comment -

        Hi

        I was wondering if there was any progress with this issue since the last communication?

        Sincerely
        Pete

        Show
        peter suggitt added a comment - Hi I was wondering if there was any progress with this issue since the last communication? Sincerely Pete
        Hide
        Gareth Reakes added a comment -

        Hi Pete,

        Yes there has been. Alby and I had a think and I am implementing a fix where the DT is just stored normally in the memory manager rather than being in a global document. Looking at the code there are several problems, including threading issues and lack of use of a user memory manager. I am on a sales run at the moment so don't have much time but hope to have this fixed in the next couple of weeks. If you would like to help out the give us a shout.

        Gareth

        Show
        Gareth Reakes added a comment - Hi Pete, Yes there has been. Alby and I had a think and I am implementing a fix where the DT is just stored normally in the memory manager rather than being in a global document. Looking at the code there are several problems, including threading issues and lack of use of a user memory manager. I am on a sales run at the moment so don't have much time but hope to have this fixed in the next couple of weeks. If you would like to help out the give us a shout. Gareth
        Hide
        peter suggitt added a comment -

        Hiya

        Thank you for the kind invitation to participate in the implementation of this fix. Unfortunately I am going to be very busy completing a Masters Dissertation (it was in the implementation of this that I actualy found the issue) for the next month.

        I not be able to look at this until the new year but after which I will gladly lend a hand if the bug is still outstanding and becoming budensome.

        Thanks again
        Pete

        Show
        peter suggitt added a comment - Hiya Thank you for the kind invitation to participate in the implementation of this fix. Unfortunately I am going to be very busy completing a Masters Dissertation (it was in the implementation of this that I actualy found the issue) for the next month. I not be able to look at this until the new year but after which I will gladly lend a hand if the bug is still outstanding and becoming budensome. Thanks again Pete
        Hide
        Hal Black added a comment -

        This bug is still present in 2.7.0 and latest nightly build (xerces-c_20070518224153)

        Show
        Hal Black added a comment - This bug is still present in 2.7.0 and latest nightly build (xerces-c_20070518224153)
        Hide
        Hal Black added a comment -

        Simpler test case

        Show
        Hal Black added a comment - Simpler test case
        Hide
        Hal Black added a comment -

        I ran a few more tests, but I think you guys have a handle on it. Here's a quick summary:
        Leak:
        impl->createDocumentType()->release()

        Leak:
        impl->createDocument(0,0,impl->createDocumentType())->release() [DOMDocumentType->release() *IS* called]

        No Leak:
        doc = impl->createDocument(0,0,0)
        doc->createDocumentType()
        doc->release()

        Terminate:
        doc1 = impl->createDocument(0,0,0)
        docType = doc1->createDocumentType()
        impl->createDocument(0,0,docType)

        [I even tried to trick the above with cloneNode, to no avail.]

        Segfault:
        docType = impl->createDocument(0,0,0)
        doc1 = impl->createDocument(0,0,docType)->release()
        doc2 = impl->createDocument(0,0,docType)->release() (kaboom)

        So it looks like there is some hybrid of keeping the part of the ownership of docType with the document, and part in the DOM implementation global document? I looked at the source briefly and it was too invovled for me to figure out.

        Anyone have any bright ideas for a workaround?

        Show
        Hal Black added a comment - I ran a few more tests, but I think you guys have a handle on it. Here's a quick summary: Leak: impl->createDocumentType()->release() Leak: impl->createDocument(0,0,impl->createDocumentType())->release() [DOMDocumentType->release() *IS* called] No Leak: doc = impl->createDocument(0,0,0) doc->createDocumentType() doc->release() Terminate: doc1 = impl->createDocument(0,0,0) docType = doc1->createDocumentType() impl->createDocument(0,0,docType) [I even tried to trick the above with cloneNode, to no avail.] Segfault: docType = impl->createDocument(0,0,0) doc1 = impl->createDocument(0,0,docType)->release() doc2 = impl->createDocument(0,0,docType)->release() (kaboom) So it looks like there is some hybrid of keeping the part of the ownership of docType with the document, and part in the DOM implementation global document? I looked at the source briefly and it was too invovled for me to figure out. Anyone have any bright ideas for a workaround?
        Hide
        Guilhem Bonnefille added a comment -

        I still do not understand how to create a Document with a DocumentType without memory leaks (on Solaris 10 with 3.0.1).

        My process produces many many XML documents. As it is "resident", its memory is growing more and more.

        Is there any workaround? Any patch to apply?

        Any help would be really appreciated. Thanks in advance.

        Show
        Guilhem Bonnefille added a comment - I still do not understand how to create a Document with a DocumentType without memory leaks (on Solaris 10 with 3.0.1). My process produces many many XML documents. As it is "resident", its memory is growing more and more. Is there any workaround? Any patch to apply? Any help would be really appreciated. Thanks in advance.
        Hide
        Boris Kolpackov added a comment -

        I chatted to Alberto about this. Here is the log of our conversation. My understanding is that it is better to use doc->createDocumentType() than impl->createDocumentType(). The same conclusion as Hal Black arrived at in his last comment.

        (06:45:40 PM) boseko_1278: Any thoughts on this: https://issues.apache.org/jira/browse/XERCESC-1508
        (06:47:12 PM) Alberto: the DOMDocument expects the DTD to come from its pool
        (06:47:21 PM) Alberto: so it doesn't delete it
        (06:47:46 PM) Alberto: but if the user allocates the DTD from the DOMImplementation object, it simply points to it
        (06:48:14 PM) boseko_1278: So why the memory leak then?
        (06:49:50 PM) Alberto: from what I recall, the leak doesn't exist; the owner of the DTD is the DOMImplementation, and will be freed only when shutting down Xerces
        (06:54:19 PM) boseko_1278: HM, and if we call release() on DocType, it is still not released?
        (06:55:37 PM) Alberto: when you release a node, you are simply putting it in a list for being recycled later
        (06:55:43 PM) Alberto: you are not deleting anything
        (06:56:05 PM) Alberto: DOMDocument::release deletes all the memory in one shot
        (06:56:19 PM) Alberto: but if the DTD didn't come from that memory pool, it is not deleted
        (06:56:33 PM) boseko_1278: Yes, I understand
        (06:57:16 PM) boseko_1278: It just seems to me that DOMDocument and DOMDocumentType are the same "kind" of objects. They are created by DOMImplementation
        (06:57:40 PM) boseko_1278: But DOMDocument you can release with a call to release() while DOMDocumentType you cannot. Strange
        (06:58:10 PM) Alberto: the issue is that a DTD can be created both as part of a document, and as a standalone object
        (06:58:26 PM) boseko_1278: We are talking about different things.
        (06:59:16 PM) boseko_1278: Let me put it this way: If I explicitly call release() on DOMDocumentType object that was created by DOMImplementation, will the memory be freed/reused?
        (07:07:59 PM) Alberto: it's middle way
        (07:09:12 PM) Alberto: if you call release on a DTD created by DOMImplementation, it will delete itself, but the memory used to store the lists of entities and elements comes from a static DOMDocument
        (07:09:25 PM) Alberto: that is deleted at shutdown
        (11:01:09 AM) boseko_1278: So you are saying that some parts of DOMDocumentType will be released and some won't
        (11:02:34 AM) boseko_1278: Are those lists of entities and elements shared among all DOMDocumentTypes?
        (11:10:46 AM) Alberto: the lists are per DTD
        (11:10:46 AM) Alberto: but yes, the major part of the memory will not be released
        (11:12:41 AM) boseko_1278: so, that's a bug then
        (11:13:27 AM) boseko_1278: and there doesn't seem to be any workaround
        (11:23:20 AM) Alberto: well, it's more a design issue
        (11:23:20 AM) Alberto: who wrote it assumed that the DOMImpletation::createDTD was not an often used API
        (11:23:20 AM) Alberto: and that was acceptable to delay the deletion of the memory at shutdown
        (11:23:20 AM) Alberto: so, the workaround is to use the createDTD method of the DOMDocument
        (11:25:13 AM) boseko_1278: Hm, that's sounds easy. Any idea why people don't do it?
        (11:51:25 AM) Alberto: the ones who complain about the problem probably have chosen that approach
        (11:51:25 AM) Alberto: but I don't know what they find valuable in using it

        Show
        Boris Kolpackov added a comment - I chatted to Alberto about this. Here is the log of our conversation. My understanding is that it is better to use doc->createDocumentType() than impl->createDocumentType(). The same conclusion as Hal Black arrived at in his last comment. (06:45:40 PM) boseko_1278: Any thoughts on this: https://issues.apache.org/jira/browse/XERCESC-1508 (06:47:12 PM) Alberto: the DOMDocument expects the DTD to come from its pool (06:47:21 PM) Alberto: so it doesn't delete it (06:47:46 PM) Alberto: but if the user allocates the DTD from the DOMImplementation object, it simply points to it (06:48:14 PM) boseko_1278: So why the memory leak then? (06:49:50 PM) Alberto: from what I recall, the leak doesn't exist; the owner of the DTD is the DOMImplementation, and will be freed only when shutting down Xerces (06:54:19 PM) boseko_1278: HM, and if we call release() on DocType, it is still not released? (06:55:37 PM) Alberto: when you release a node, you are simply putting it in a list for being recycled later (06:55:43 PM) Alberto: you are not deleting anything (06:56:05 PM) Alberto: DOMDocument::release deletes all the memory in one shot (06:56:19 PM) Alberto: but if the DTD didn't come from that memory pool, it is not deleted (06:56:33 PM) boseko_1278: Yes, I understand (06:57:16 PM) boseko_1278: It just seems to me that DOMDocument and DOMDocumentType are the same "kind" of objects. They are created by DOMImplementation (06:57:40 PM) boseko_1278: But DOMDocument you can release with a call to release() while DOMDocumentType you cannot. Strange (06:58:10 PM) Alberto: the issue is that a DTD can be created both as part of a document, and as a standalone object (06:58:26 PM) boseko_1278: We are talking about different things. (06:59:16 PM) boseko_1278: Let me put it this way: If I explicitly call release() on DOMDocumentType object that was created by DOMImplementation, will the memory be freed/reused? (07:07:59 PM) Alberto: it's middle way (07:09:12 PM) Alberto: if you call release on a DTD created by DOMImplementation, it will delete itself, but the memory used to store the lists of entities and elements comes from a static DOMDocument (07:09:25 PM) Alberto: that is deleted at shutdown (11:01:09 AM) boseko_1278: So you are saying that some parts of DOMDocumentType will be released and some won't (11:02:34 AM) boseko_1278: Are those lists of entities and elements shared among all DOMDocumentTypes? (11:10:46 AM) Alberto: the lists are per DTD (11:10:46 AM) Alberto: but yes, the major part of the memory will not be released (11:12:41 AM) boseko_1278: so, that's a bug then (11:13:27 AM) boseko_1278: and there doesn't seem to be any workaround (11:23:20 AM) Alberto: well, it's more a design issue (11:23:20 AM) Alberto: who wrote it assumed that the DOMImpletation::createDTD was not an often used API (11:23:20 AM) Alberto: and that was acceptable to delay the deletion of the memory at shutdown (11:23:20 AM) Alberto: so, the workaround is to use the createDTD method of the DOMDocument (11:25:13 AM) boseko_1278: Hm, that's sounds easy. Any idea why people don't do it? (11:51:25 AM) Alberto: the ones who complain about the problem probably have chosen that approach (11:51:25 AM) Alberto: but I don't know what they find valuable in using it
        Hide
        Guilhem Bonnefille added a comment -

        Yeah!!! It works!!! Many thanks Boris.

        As a reminder, the following code leaks:

        impl->createDocument(0,0,impl->createDocumentType())->release();

        It's equivalent code is:

        doc = impl->createDocument(0,0,0)
        dtd = doc->createDocumentType()
        doc->appendChild(dtd)
        doc->release()

        Note the "append" to have the same behaviour.

        Show
        Guilhem Bonnefille added a comment - Yeah!!! It works!!! Many thanks Boris. As a reminder, the following code leaks: impl->createDocument(0,0,impl->createDocumentType())->release(); It's equivalent code is: doc = impl->createDocument(0,0,0) dtd = doc->createDocumentType() doc->appendChild(dtd) doc->release() Note the "append" to have the same behaviour.
        Hide
        Boris Kolpackov added a comment -

        This is a design issue so I am changing this from Bug to Improvement.

        Show
        Boris Kolpackov added a comment - This is a design issue so I am changing this from Bug to Improvement.

          People

          • Assignee:
            Gareth Reakes
            Reporter:
            peter suggitt
          • Votes:
            3 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development