|
[
Permlink
| « Hide
]
Tsz Wo (Nicholas), SZE added a comment - 29/Dec/08 08:18 PM
4951_20081229.patch: acquire the LeaseManager lock but not the Monitor lock
4951_20081229b.patch: acquire the lock in checkLeases()
Fix looks fine. I think it is better to properly indent the code even if it means a bigger patch.
or you could move checkLeases() to LeaseMonitor class (where it actually belongs).
> or you could move checkLeases() to LeaseMonitor class (where it actually belongs).
You are right. I should move it out. 4951_20081229c.patch: moved checkLeases() and synchronized toString(). [exec] -1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
[exec] Please justify why no tests are needed for this patch.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
No new tests added since I don't know how to create a meaningful test for this. 4951_20081229c_0.18.patch: for 0.18
4951_20081229c_0.19.patch: for 0.19
Tested locally. Only TestMapReduceLocal failed but it is not related to this. See
I just committed this.
Nicholas, could you attach the stack trace you observed?
And what is the effect of this bug? This is the exception and the stack trace.
Exception in thread "org.apache.hadoop.dfs.LeaseManager$Monitor@3dbe8711" java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154) at org.apache.hadoop.dfs.LeaseManager$Monitor.checkLeases(LeaseManager.java:370) at org.apache.hadoop.dfs.LeaseManager$Monitor.run(LeaseManager.java:346) at java.lang.Thread.run(Thread.java:619) Another problem with this issue is that the name-node was running without lease monitor as if nothing happened. As a result, a lot of abandoned files have not been garbage collected. > And what is the effect of this bug?
The LeaseManager lock is used to protect the lease related data structures. So the bug leads to a race condition on the leases. This did not fix the ConcurrentModificationException. Although it corrected the synchronization problem.
The exception now looks as Exception in thread "org.apache.hadoop.dfs.LeaseManager$Monitor@2244d990" java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154) at org.apache.hadoop.dfs.LeaseManager.checkLeases(LeaseManager.java:371) at org.apache.hadoop.dfs.LeaseManager.access$800(LeaseManager.java:51) at org.apache.hadoop.dfs.LeaseManager$Monitor.run(LeaseManager.java:346) at java.lang.Thread.run(Thread.java:619) Looks like the problem is that internalReleaseLease() finalizes empty files, which removes these file path names from the lease. So this modifies the Collection of file names which we are iterating on. It will be easier to track changes if I open a new issue. Closing this one.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||