Issue Details (XML | Word | Printable)

Key: HADOOP-4951
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Tsz Wo (Nicholas), SZE
Reporter: Tsz Wo (Nicholas), SZE
Votes: 0
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Lease monitor does not own the LeaseManager lock in changing leases.

Created: 29/Dec/08 08:07 PM   Updated: 08/Jul/09 04:43 PM
Return to search
Component/s: None
Affects Version/s: 0.18.2
Fix Version/s: 0.18.3

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 4951_20081229.patch 2008-12-29 08:18 PM Tsz Wo (Nicholas), SZE 0.9 kB
Text File Licensed for inclusion in ASF works 4951_20081229b.patch 2008-12-29 11:06 PM Tsz Wo (Nicholas), SZE 0.8 kB
Text File Licensed for inclusion in ASF works 4951_20081229c.patch 2008-12-29 11:24 PM Tsz Wo (Nicholas), SZE 2 kB
Text File Licensed for inclusion in ASF works 4951_20081229c_0.18.patch 2008-12-30 12:31 AM Tsz Wo (Nicholas), SZE 3 kB
Text File Licensed for inclusion in ASF works 4951_20081229c_0.19.patch 2008-12-30 12:55 AM Tsz Wo (Nicholas), SZE 3 kB
Text File Licensed for inclusion in ASF works CMEinLeaseRecovery-0-18.patch 2008-12-31 02:53 AM Konstantin Shvachko 1 kB
Issue Links:
Reference
 

Hadoop Flags: Reviewed
Resolution Date: 31/Dec/08 02:56 AM


 Description  « Hide
In Monitor.checkLeases(), the monitor thread does not own the LeaseManager lock but it may modify the leases.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Tsz Wo (Nicholas), SZE added a comment - 29/Dec/08 08:18 PM
4951_20081229.patch: acquire the LeaseManager lock but not the Monitor lock

Tsz Wo (Nicholas), SZE added a comment - 29/Dec/08 11:06 PM
4951_20081229b.patch: acquire the lock in checkLeases()

Raghu Angadi added a comment - 29/Dec/08 11:10 PM
Fix looks fine. I think it is better to properly indent the code even if it means a bigger patch.

Raghu Angadi added a comment - 29/Dec/08 11:12 PM
or you could move checkLeases() to LeaseMonitor class (where it actually belongs).

Tsz Wo (Nicholas), SZE added a comment - 29/Dec/08 11:24 PM
> or you could move checkLeases() to LeaseMonitor class (where it actually belongs).

You are right. I should move it out.

4951_20081229c.patch: moved checkLeases() and synchronized toString().


Raghu Angadi added a comment - 29/Dec/08 11:40 PM
+1. Looks good.

Tsz Wo (Nicholas), SZE added a comment - 30/Dec/08 12:10 AM
     [exec] -1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
     [exec]                         Please justify why no tests are needed for this patch.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

No new tests added since I don't know how to create a meaningful test for this.


Tsz Wo (Nicholas), SZE added a comment - 30/Dec/08 12:31 AM
4951_20081229c_0.18.patch: for 0.18

Tsz Wo (Nicholas), SZE added a comment - 30/Dec/08 12:55 AM
4951_20081229c_0.19.patch: for 0.19

Tsz Wo (Nicholas), SZE added a comment - 30/Dec/08 01:58 AM
Tested locally. Only TestMapReduceLocal failed but it is not related to this. See HADOOP-4907.

Tsz Wo (Nicholas), SZE added a comment - 30/Dec/08 02:57 AM
I just committed this.

Koji Noguchi added a comment - 30/Dec/08 05:00 PM
Nicholas, could you attach the stack trace you observed?
And what is the effect of this bug?

Konstantin Shvachko added a comment - 30/Dec/08 06:06 PM
This is the exception and the stack trace.
Exception in thread "org.apache.hadoop.dfs.LeaseManager$Monitor@3dbe8711"
java.util.ConcurrentModificationException
        at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
        at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154)
        at org.apache.hadoop.dfs.LeaseManager$Monitor.checkLeases(LeaseManager.java:370)
        at org.apache.hadoop.dfs.LeaseManager$Monitor.run(LeaseManager.java:346)
        at java.lang.Thread.run(Thread.java:619)

Another problem with this issue is that the name-node was running without lease monitor as if nothing happened. As a result, a lot of abandoned files have not been garbage collected.
This patch fixes the ConcurrentModificationException but does not fix the problem of NN running without the lease monitor. I'll file another jira for that.


Tsz Wo (Nicholas), SZE added a comment - 30/Dec/08 07:18 PM
> And what is the effect of this bug?

The LeaseManager lock is used to protect the lease related data structures. So the bug leads to a race condition on the leases.


Konstantin Shvachko added a comment - 31/Dec/08 02:52 AM
This did not fix the ConcurrentModificationException. Although it corrected the synchronization problem.
The exception now looks as
Exception in thread "org.apache.hadoop.dfs.LeaseManager$Monitor@2244d990" java.util.ConcurrentModificationException
        at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
        at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154)
        at org.apache.hadoop.dfs.LeaseManager.checkLeases(LeaseManager.java:371)
        at org.apache.hadoop.dfs.LeaseManager.access$800(LeaseManager.java:51)
        at org.apache.hadoop.dfs.LeaseManager$Monitor.run(LeaseManager.java:346)
        at java.lang.Thread.run(Thread.java:619)

Looks like the problem is that internalReleaseLease() finalizes empty files, which removes these file path names from the lease. So this modifies the Collection of file names which we are iterating on.
An additional patch need to be supplied.


Konstantin Shvachko added a comment - 31/Dec/08 02:56 AM
It will be easier to track changes if I open a new issue. Closing this one.