Issue Details (XML | Word | Printable)

Key: CONFIGURATION-390
Type: Bug Bug
Status: Open Open
Priority: Major Major
Assignee: Unassigned
Reporter: Emmanuel Bourg
Votes: 1
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Commons Configuration

AbstractHierarchicalFileConfiguration is not thread safe

Created: 23/Jun/09 03:49 PM   Updated: 05/Oct/09 07:34 PM
Return to search
Component/s: File reloading
Affects Version/s: 1.6
Fix Version/s: 1.7

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works commons-configuration-390.patch 2009-09-26 03:35 PM Oliver Heger 5 kB
GZip Archive configtest.tar.gz 2009-09-25 05:30 PM Stephen Kinser 2 kB
Text File TestSuite.txt 2009-09-25 05:30 PM Stephen Kinser 231 kB
Issue Links:
Reference
 


 Description  « Hide
AbstractHierarchicalFileConfiguration doesn't implement the same locking mechanism found in AbstractFileConfiguration. The consequence is that getting a property while the configuration is being reloaded by another thread can return an invalid result.

This can be demonstrated by changing testDeadlockWithReload() in TestCombinedConfiguration to use an XMLConfiguration instead of a PropertiesConfiguration.

Here is a reduced test case:

public void testConcurrentGetAndReload() throws Exception
{
    //final FileConfiguration config = new PropertiesConfiguration("test.properties");
    final FileConfiguration config = new XMLConfiguration("test.xml");
    config.setReloadingStrategy(new FileAlwaysReloadingStrategy());

    assertTrue("Property not found", config.getProperty("test.short") != null);

    new Thread()
    {
        public void run()
        {
            for (int i = 0; i < 1000; i++)
            {
                config.reload();
            }
        }
    }.start();
    
    for (int i = 0; i < 1000; i++)
    {
        assertTrue("Property not found", config.getProperty("test.short") != null); // failure here
    }
}

The test doesn't always fail. It does about 50% of the time.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Stephen Kinser added a comment - 25/Sep/09 05:26 PM
I hit this same problem, though with a CombinedConfiguration made up of multiple XMLConfigurations. I am attaching a small maven project that demonstrates the problem, and fails about 50% of the time.

Stephen Kinser added a comment - 25/Sep/09 05:30 PM
I also get this problem randomly when multithreaded, though I don't know if this belongs in a different JIRA:

testMultiThreadedAccess(TestReloading) Time elapsed: 0.018 sec <<< FAILURE!
java.lang.IllegalStateException: Node cannot be modified when added to a parent!
at org.apache.commons.configuration.tree.DefaultConfigurationNode.checkState(DefaultConfigurationNode.java:454)
at org.apache.commons.configuration.tree.DefaultConfigurationNode.setAttribute(DefaultConfigurationNode.java:290)
at org.apache.commons.configuration.tree.DefaultConfigurationNode.addChild(DefaultConfigurationNode.java:184)
at org.apache.commons.configuration.tree.ViewNode.addChild(ViewNode.java:78)
at org.apache.commons.configuration.tree.ViewNode.appendChildren(ViewNode.java:114)
at org.apache.commons.configuration.CombinedConfiguration$ConfigData.getTransformedRoot(CombinedConfiguration.java:847)
at org.apache.commons.configuration.CombinedConfiguration.constructCombinedNode(CombinedConfiguration.java:699)
at org.apache.commons.configuration.CombinedConfiguration.getRootNode(CombinedConfiguration.java:542)
at org.apache.commons.configuration.HierarchicalConfiguration.fetchNodeList(HierarchicalConfiguration.java:926)
at org.apache.commons.configuration.CombinedConfiguration.fetchNodeList(CombinedConfiguration.java:653)
at org.apache.commons.configuration.HierarchicalConfiguration.getProperty(HierarchicalConfiguration.java:332)
at org.apache.commons.configuration.AbstractConfiguration.resolveContainerStore(AbstractConfiguration.java:1160)
at org.apache.commons.configuration.AbstractConfiguration.getString(AbstractConfiguration.java:1035)
at org.apache.commons.configuration.AbstractConfiguration.getString(AbstractConfiguration.java:1018)
at TestReloading.testMultiThreadedAccess(TestReloading.java:73)


Ralph Goers added a comment - 26/Sep/09 03:57 AM
Thanks for reporting this. It just so happens that I am also experience problems that are most likely due to reloading not being thread safe and was going to create a test case. It looks like you saved me the effort.

Oliver Heger added a comment - 26/Sep/09 02:52 PM
CONFIGURATION-344 contains some comments about threading problems with reload operations in general. They also are valid for this issue.

Oliver Heger added a comment - 26/Sep/09 03:35 PM
I have created a unit test that fails more reliably and a proposed fix.

The idea is that the lock used by the reload() method is exposed so that external synchronization can be performed. AbstractHierarchicalFileConfiguration now synchronizes at this lock in its getProperty() method. This does not fix all problems with this architecture, e.g. other methods like isEmpty() or containsKey() should be treated in a similar way. But concurrent property reads should now be possible.


Ralph Goers added a comment - 26/Sep/09 05:34 PM
Oliver, I have created a very similar patch but I am using the reloading lock on a lot more methods such as containsKey(), etc. This is necessary because all these methods are at risk of having the underlying structure change, not just getProperty.

However, CombinedConfiguration seems to be a tougher nut to crack. The test I am using is below and is a variation on the XMLConfiguration test case and was added to TestCombinedConfiguration and fails every time.
public void testConcurrentGetAndReload() throws Exception
{
config.setForceReloadCheck(true);
config.setNodeCombiner(new MergeCombiner());
//final FileConfiguration config = new PropertiesConfiguration("test.properties");
final XMLConfiguration xml = new XMLConfiguration("test.xml");
xml.setReloadingStrategy(new FileAlwaysReloadingStrategy());
config.addConfiguration(xml);

assertTrue("Property not found", config.getProperty("test.short") != null);

Thread testThreads[] = new Thread[5];

for (int i = 0; i < testThreads.length; ++i)

{ testThreads[i] = new ReloadThread(xml); testThreads[i].start(); }

for (int i = 0; i < 2000; i++)

{ assertTrue("Property not found", config.getProperty("test.short") != null); }

for (int i = 0; i < testThreads.length; ++i)

{ testThreads[i].join(); }

}

private class ReloadThread extends Thread
{
FileConfiguration config;

ReloadThread(FileConfiguration config)

{ this.config = config; }

public void run()
{
for (int i = 0; i < 1000; i++)

{ config.reload(); }

}
}


Oliver Heger added a comment - 26/Sep/09 08:03 PM
Okay, using the reloading lock for all methods that might be affected is certainly the way to go (I was only too lazy to do this But to be consistent, analogous changes should be applied to AbstractFileConfiguration: Here we have the same pattern that methods first call reload() and then do something with the data, which might already be invalid due to another reload() operation.

Just an idea: Would it make sense to have a method like syncReload(Accessor acc) in AbstractFileConfiguration like the following:

protected interface Accessor
{
    Object access(FileConfiguration config);
}

protected Object syncReload(Accessor acc)
{
    synchronized(reloadLock)
    {
        reload();
        return acc.access(this);
    }
}

Maybe this would reduce the clutter in the affected methods a bit and allow a more closure-oriented programming.

Now to CombinedConfiguration: AFAICT the problem lies in the getTransformedRoot() method of the helper class ConfigData. We have here the following code:

// Copy data of the root node to the new path
rootNode = ConfigurationUtils.convertToHierarchical(getConfiguration(),
    getConversionExpressionEngine()).getRootNode();
atParent.appendChildren(rootNode);
atParent.appendAttributes(rootNode);

To be save, this fragment has to be executed with the reload lock of the processed configuration hold. A dirty hack could be to add the getReloadLock() method to the FileConfiguration interface and check whether the configuration implements this interface:

if(getConfiguration() instanceof FileConfiguration)
{
    synchronized(((FileConfiguration) getConfiguration()).getReloadLock())
    {
        // Copy data of the root node to the new path
        ...
    }
}

However, changing the FileConfiguration interface is a binary incompatible change. So we might have to introduce an additional interface solely for this purpose.

WDYT?


Ralph Goers added a comment - 26/Sep/09 08:37 PM
I don't understand the Accessor thing. The issue is that various operations have to be synchronized since the underlying structure might change while the operation is being performed. I'm not sure how this construct helps.

With CombinedConfiguration I also have focused on that bit of code and syncrhonized there pretty much the way you have suggested but the use case I posted above still fails. From what I can tell this is because in an XMLConfiguration the root node is cleared and then reloaded. Apparently getTransformedRoot() adds the nodes from the actual HierarchicalConfiguration into the CombinedConfiguration instead of making copies of them. This means that once the reloadLock is released the CombinedConfiguration can be modified asynchronously.

The above shouldn't be a problem if it simply isn't allowed to use a Configuration object directly once it is incorporated into a combined configuration.


Ralph Goers added a comment - 27/Sep/09 12:35 AM - edited
After a few more minutes of thought I think even this restriction won't solve the problem. If two threads access the Combined Configuration, one may get through getRootNode and start working with it. The second thread will detect a reload and start into constructCombinedNode and that will screw up the thread that is working with it even though they will have different "combinedRoot" nodes, because the underlying Nodes from the configurations being combined are being messed with.

I should also mention that I took configtest.tar.gz attached to this issue and tried it with the modifications to lock both getRootNode() and the actual HierarchicalConfiguration in getTransformedRoot() and it made absolutely no difference. It still fails most of the time.

I can only think of two ways out of this:
1. Clone the underlying configurations. This is a really ugly solution as something like MultiFileHierachicalConfiguration won't be pretty to clone.
2. Synchronize the public methods that need access to the root node more or less the same as is needed in XMLConfiguration. Again, this can be optimized in 2.0 by using Read/Write locks. I'd love to do this in 1.7 but I really don't want to introduce a dependency on util.concurrent.


Ralph Goers added a comment - 27/Sep/09 06:26 AM
I was able to get my test to pass by synchronizing the public methods. This gets even more interesting because of SubnodeConfiguration objects.

I'm beginning to think that AbstractHierarchicalFileConfiguration, CombinedConfiguration and SubnodeConfiguration all need to extend a new class named HierarchicalReloadableConfiguration. This class would provide the reload lock object (or set it to the lock provided by the parent) and then lock all the appropriate methods. Subclasses wouldn't then have to provide overrides for all the methods and would be able to lock with the appropriate lock when they do override methods.


Oliver Heger added a comment - 27/Sep/09 07:38 PM
A common base class that handles the locking stuff sounds like a good solution.

I fear, in its current state the code is hardly maintainable. A lot of methods have to aware of the reloading functionality, and if a synchronization is missing, this will lead to trouble. So maybe for 2.0 we should think about a radical different approach to reloading?


Ralph Goers added a comment - 27/Sep/09 08:47 PM - edited
Yes, I agree.

I am going to commit some code that I believe fixes this shortly. Unfortunately, I've been working on other stuff and that is going to get committed as well, so I am thinking I will create a branch for this so it can be inspected before it gets merged to trunk. I will be opening a separate jira issue for the other stuff I've been working on.

A large part of the problem seems to be how hierarchical configurations are represented in memory and how they are being shared in a combined configuration. Also, I'm not even sure a "combined configuration" is really the right way to go. The composite configuration would probably be easier to deal with. Since that is all DefaultConfigurationBuilder will work with though, there isn't a lot of choice.


Ralph Goers added a comment - 27/Sep/09 10:46 PM
I have committed my changes to https://svn.apache.org/repos/asf/commons/proper/configuration/branches/CONFIGURATION_390. Please inspect the changes and test.

1. CombinedConfiguration and SubnodeConfiguration both extend HierarchicalReloadableConfiguration.
2. HierarchicalReloadableConfiguration implements Reloadable.
3. AbstractHierarchicalFileConfiguration implements Reloadable as delegate.getReloadLock().
4. AbstractHierarchicalFileConfiguration wraps many of the methods with synchronized(delegate.getReloadLock()).
5. HierarchicalReloadableConfiguration has a constructor that can be passed the lock to synchronize on. If no lock is passed a new Object is used.
6. SubnodeConfiguration's constructor gets the parents lock object if the parent is Reloadable and passes it to HierarchicalReloadableConfiguraton's constructor. This should cause any SubnodeConfiguration for XMLConfiguration or CombinedConfiguration to use the lock of its parent.


Oliver Heger added a comment - 28/Sep/09 08:19 PM
Ralph, your changes look good to me. I have the following questions/remarks:
  • Can we be sure that the problem described in CONFIGURATION-344 does not pop up again? With the lock used by the CombinedConfiguration and the locks used by child configurations there are multiple locks involved which may be prone to dead lock under certain circumstances.
  • I wonder whether SubnodeConfiguration.getRootNode() needs locking, too. Probably not, because it is mainly called internally by methods that should already have the lock. But I am not sure.

Ralph Goers added a comment - 29/Sep/09 05:05 AM
Ultimately, we have to find a better solution than this.

And I guess I'm not done yet. Now that we know how we can make this happen in our production system. Unfortunately, even with these changes it is still happening. We use DynamicCombinedConfiguration and MultiFileHierarchicalConfiguration so I'm going to have to include those into the mix.


Ralph Goers added a comment - 29/Sep/09 11:44 PM
I've made some more changes on the branch. I was getting out of memory errors in some unit tests because clear was not creating a new root node. I also included DynamicCombinedConfiguration in the locking and now it passes all the tests.

Although I may push this to trunk I am going to keep looking into this to see if there is a better way to deal with this as it is going to impact performance.


Ralph Goers added a comment - 05/Oct/09 02:12 AM
More changes on the branch. I've been able to fix the problem without having to resort to locking all the method calls. When a combined configuration is constructed a copy of each participating configuration is used and locking is now done in appropriate places. This allows the reload to take place without affecting other threads as they will continue using a different tree until a subsequent getRootNode() call is made.

Oliver Heger added a comment - 05/Oct/09 07:34 PM
This sounds good. I guess creating copies of the configurations may be an issue for large data sets, but in most typical applications the amount of configuration data should be limited. And it is certainly preferable to get rid of all that locking spread all over the code.