Uploaded image for project: 'Xerces-C++'
  1. Xerces-C++
  2. XERCESC-825

Thread safety problems in .../util/ and ../util/regx

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Resolution: Fixed
    • 2.1.0
    • 2.2.0
    • None
    • Operating System: Solaris
      Platform: Sun
    • 17516

    Description

      Hi all,

      Firstly a quick caveat: While I'm only running under 2.1.0, I've checked 2.2.0
      at cvs.apache.org and the faults I've found in the code appear to still exist.
      But, since I haven't actually compiled and run under 2.2.0, I'm more than happy
      to be proved wrong...

      I've written an application which contains multiple DOM parsers each running in
      a seperate thread, each doing full validation using a schema that contains regex
      facets. When this application is run on a multi-processor host (SunFire 880)
      fatal errors (core dumps, assertions, etc) are regularly observed.

      From my investigations I believe the fault is at least partially due to errors
      in the thread-saftey of the RangeTokenMap class (.../util/regx/RangeTokenMap.*).

      The single static RangeTokenMap object is created in the first call to the
      static RangeTokenMap::instance() method. The race condition is supposedly
      handled by the compareAndSwap() method which initialises the static fInstance
      pointer only if it's NULL. Even though compareAndSwap is mutex protected, by
      this stage it's already too late.

      Because the race condition results in double construction of the RangeTokenMap
      object, the second instance has to be deleted. Unfortunately the destructor
      contains a line setting the fInstance pointer back to NULL. Since fInstance is
      a static pointer, this clobbers the RangeTokenMap we actually want to use. So,
      by the time we come to return fInstance from RangeTokenMap::instance() it is
      useless...

      I replaced the implementation of RangeTokenMap::instance() with a version that
      only constructs the object once...

      RangeTokenMap* RangeTokenMap::instance()
      {
      static XMLRegisterCleanup instanceCleanup;
      if( !fInstance )
      {
      XMLMutexLock lockInit( &fMutex );
      if( !fInstance )

      { fInstance = new RangeTokenMap(); instanceCleanup.registerCleanup( reinitInstance ); }

      }
      return (fInstance);
      }

      Obviously this change meant fMutex needed to become static.

      I also added mutex protection for the other methods of RangeTokenMap, including
      moving the locking in getRange() to the start of the method. These extra locks
      may have been overkill, but since the methods didn't appear to be obviously
      const I thought it was probably prudent. Can someone who knows the intended
      uses of RangeTokenMap better than I do (i.e. at all) advise me otherwise?

      After rebuilding Xerces with these fixes, the fatal errors became less
      frequent, but didn't disappear completely.

      I grepped around in the source a bit more, and I found the RangeTokenMap
      approach to constructing static objects was reused in the EncodingValidator
      class (.../util/EncodingValidator.*). I made the same single construction and
      method mutexing changes there as well.

      Finally tracked another problem to the use of the RangeTokenMap in the
      TokenFactory class (.../util/regx/TokenFactory.*). The
      TokenFactory::initializeRegistry() method calls multiple methods of the
      (static) RangeTokenMap object, but only if its fRangeInitialized flag is false.
      Since fRangeInitialized wasn't static, every thread that came to initialise the
      TokenFactory did the RangeTokenMap initialisation again.

      I don't fully recall why this multiple initialisation was harmful, as opposed
      to just unecessary, since it's a little while since I was looking at this
      problem. The fix I made was to add a static mutex to TokenFactory and made the
      fRangeInitialized flag static. The mutex is locked before attempting anything
      inside TokenFactory::initializeRegistry()

      After making all three of these changes I haven't had any further problems with
      my app. But, since most of the changes were made without any real understanding
      of what these objects actually do, I have a suspicion that I may have let
      myself in for some more subtle problems in the future.

      Unfortunately I can't provide any test code for these bugs, since my app is too
      tightly bound to our code base to be useful to anyone else and I don't have time
      to create a fully portable version. Besides which, I doubt that test code would
      necessarily guarantee these issues were excercised anyway... Hopefully I've
      provided enough explanation to illustrate the problem without example code.

      Thanks for your time...

      Cheers,
      Andrew

      Attachments

        Activity

          People

            Unassigned Unassigned
            andrew@unico.com.au andrew
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: