|
[
Permlink
| « Hide
]
Stephen Kinser added a comment - 25/Sep/09 05:26 PM
I hit this same problem, though with a CombinedConfiguration made up of multiple XMLConfigurations. I am attaching a small maven project that demonstrates the problem, and fails about 50% of the time.
I also get this problem randomly when multithreaded, though I don't know if this belongs in a different JIRA:
testMultiThreadedAccess(TestReloading) Time elapsed: 0.018 sec <<< FAILURE! Thanks for reporting this. It just so happens that I am also experience problems that are most likely due to reloading not being thread safe and was going to create a test case. It looks like you saved me the effort.
I have created a unit test that fails more reliably and a proposed fix.
The idea is that the lock used by the reload() method is exposed so that external synchronization can be performed. AbstractHierarchicalFileConfiguration now synchronizes at this lock in its getProperty() method. This does not fix all problems with this architecture, e.g. other methods like isEmpty() or containsKey() should be treated in a similar way. But concurrent property reads should now be possible. Oliver, I have created a very similar patch but I am using the reloading lock on a lot more methods such as containsKey(), etc. This is necessary because all these methods are at risk of having the underlying structure change, not just getProperty.
However, CombinedConfiguration seems to be a tougher nut to crack. The test I am using is below and is a variation on the XMLConfiguration test case and was added to TestCombinedConfiguration and fails every time. assertTrue("Property not found", config.getProperty("test.short") != null); Thread testThreads[] = new Thread[5]; for (int i = 0; i < testThreads.length; ++i) { testThreads[i] = new ReloadThread(xml); testThreads[i].start(); }for (int i = 0; i < 2000; i++) { assertTrue("Property not found", config.getProperty("test.short") != null); }for (int i = 0; i < testThreads.length; ++i) { testThreads[i].join(); }} private class ReloadThread extends Thread ReloadThread(FileConfiguration config) { this.config = config; } public void run() } Okay, using the reloading lock for all methods that might be affected is certainly the way to go (I was only too lazy to do this
Just an idea: Would it make sense to have a method like syncReload(Accessor acc) in AbstractFileConfiguration like the following: protected interface Accessor { Object access(FileConfiguration config); } protected Object syncReload(Accessor acc) { synchronized(reloadLock) { reload(); return acc.access(this); } } Maybe this would reduce the clutter in the affected methods a bit and allow a more closure-oriented programming. Now to CombinedConfiguration: AFAICT the problem lies in the getTransformedRoot() method of the helper class ConfigData. We have here the following code: // Copy data of the root node to the new path
rootNode = ConfigurationUtils.convertToHierarchical(getConfiguration(),
getConversionExpressionEngine()).getRootNode();
atParent.appendChildren(rootNode);
atParent.appendAttributes(rootNode);
To be save, this fragment has to be executed with the reload lock of the processed configuration hold. A dirty hack could be to add the getReloadLock() method to the FileConfiguration interface and check whether the configuration implements this interface: if(getConfiguration() instanceof FileConfiguration) { synchronized(((FileConfiguration) getConfiguration()).getReloadLock()) { // Copy data of the root node to the new path ... } } However, changing the FileConfiguration interface is a binary incompatible change. So we might have to introduce an additional interface solely for this purpose. WDYT? I don't understand the Accessor thing. The issue is that various operations have to be synchronized since the underlying structure might change while the operation is being performed. I'm not sure how this construct helps.
With CombinedConfiguration I also have focused on that bit of code and syncrhonized there pretty much the way you have suggested but the use case I posted above still fails. From what I can tell this is because in an XMLConfiguration the root node is cleared and then reloaded. Apparently getTransformedRoot() adds the nodes from the actual HierarchicalConfiguration into the CombinedConfiguration instead of making copies of them. This means that once the reloadLock is released the CombinedConfiguration can be modified asynchronously. The above shouldn't be a problem if it simply isn't allowed to use a Configuration object directly once it is incorporated into a combined configuration. After a few more minutes of thought I think even this restriction won't solve the problem. If two threads access the Combined Configuration, one may get through getRootNode and start working with it. The second thread will detect a reload and start into constructCombinedNode and that will screw up the thread that is working with it even though they will have different "combinedRoot" nodes, because the underlying Nodes from the configurations being combined are being messed with.
I should also mention that I took configtest.tar.gz attached to this issue and tried it with the modifications to lock both getRootNode() and the actual HierarchicalConfiguration in getTransformedRoot() and it made absolutely no difference. It still fails most of the time. I can only think of two ways out of this: I was able to get my test to pass by synchronizing the public methods. This gets even more interesting because of SubnodeConfiguration objects.
I'm beginning to think that AbstractHierarchicalFileConfiguration, CombinedConfiguration and SubnodeConfiguration all need to extend a new class named HierarchicalReloadableConfiguration. This class would provide the reload lock object (or set it to the lock provided by the parent) and then lock all the appropriate methods. Subclasses wouldn't then have to provide overrides for all the methods and would be able to lock with the appropriate lock when they do override methods. A common base class that handles the locking stuff sounds like a good solution.
I fear, in its current state the code is hardly maintainable. A lot of methods have to aware of the reloading functionality, and if a synchronization is missing, this will lead to trouble. So maybe for 2.0 we should think about a radical different approach to reloading? Yes, I agree.
I am going to commit some code that I believe fixes this shortly. Unfortunately, I've been working on other stuff and that is going to get committed as well, so I am thinking I will create a branch for this so it can be inspected before it gets merged to trunk. I will be opening a separate jira issue for the other stuff I've been working on. A large part of the problem seems to be how hierarchical configurations are represented in memory and how they are being shared in a combined configuration. Also, I'm not even sure a "combined configuration" is really the right way to go. The composite configuration would probably be easier to deal with. Since that is all DefaultConfigurationBuilder will work with though, there isn't a lot of choice. I have committed my changes to https://svn.apache.org/repos/asf/commons/proper/configuration/branches/CONFIGURATION_390
1. CombinedConfiguration and SubnodeConfiguration both extend HierarchicalReloadableConfiguration. Ralph, your changes look good to me. I have the following questions/remarks:
Ultimately, we have to find a better solution than this.
And I guess I'm not done yet. Now that we know how we can make this happen in our production system. Unfortunately, even with these changes it is still happening. We use DynamicCombinedConfiguration and MultiFileHierarchicalConfiguration so I'm going to have to include those into the mix. I've made some more changes on the branch. I was getting out of memory errors in some unit tests because clear was not creating a new root node. I also included DynamicCombinedConfiguration in the locking and now it passes all the tests.
Although I may push this to trunk I am going to keep looking into this to see if there is a better way to deal with this as it is going to impact performance. More changes on the branch. I've been able to fix the problem without having to resort to locking all the method calls. When a combined configuration is constructed a copy of each participating configuration is used and locking is now done in appropriate places. This allows the reload to take place without affecting other threads as they will continue using a different tree until a subsequent getRootNode() call is made.
This sounds good. I guess creating copies of the configurations may be an issue for large data sets, but in most typical applications the amount of configuration data should be limited. And it is certainly preferable to get rid of all that locking spread all over the code.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||