Uploaded image for project: 'Xerces-C++'
  1. Xerces-C++
  2. XERCESC-2043

String pooling in DOMDocumentImpl is unsafe, particularly on 64-bit platforms



    • Bug
    • Status: Closed
    • Critical
    • Resolution: Duplicate
    • 3.1.1
    • 3.1.2
    • DOM
    • None
    • Linux, RHEL6, likely any 64-bit OS


      There seem to be conditions under which an invalid DOM is generated such that elements contain a prefix set to the entire qualified name of the element and not just the prefix.

      e.g., I had a case where an element called del:Delegate (with del defined up above) ended up such that getPrefix() returned "del:Delegate" and not just "del".

      After much investigation, I traced this to the use of DOMDocumentImpl::getPooledNString when storing the prefix inside DOMElementNSImpl::setName()

      This pooling logic is very odd, because it performs a hash of N characters of the input string, and compares the pool entries it searches based on N characters, but then returns the whole pooled string on a match. The problem is that there's one string pool shared between both the getPooledString() and getPooledNString() methods, even though they operate based on different assumptions. If you get collisions between hash values in the two cases, you can end up returning a string inserted by getPooledString() when you call getPooledNString().

      In my case, del:Delegate was hashed and stored by getPooledString(), and then getPooledNString("del:Delegate", 3) ends up returning "del:Delegate" instead of just inserting and returning "del". This is obviously catastrophic.

      Couple things:

      This algorithm is just plain unsafe no matter what because a hash collision under the right conditions creates an invalid DOM.

      The source of the collisions may be in part because the hash methods in XMLString were written based on XMLSize_t being 32-bits, and now they can be 64-bits. The collision probability may be higher now.

      My suggested fix is to split things so that DOMDocumentImpl carries two pools, one for the getPooledString method and one for getPooledNString, to eliminate any chance of collison resulting in invalid returned strings. The fix for that is quite trivial.

      This is a major bug that essentially breaks the parser under the right conditions in a very unexpected way. I really need a 3.1.2 to get this fixed, and am wiling to contribute to the release process.


        Issue Links



              scantor Scott Cantor
              scantor Scott Cantor
              0 Vote for this issue
              0 Start watching this issue