Hadoop Common
  1. Hadoop Common
  2. HADOOP-4951

Lease monitor does not own the LeaseManager lock in changing leases.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.18.2
    • Fix Version/s: 0.18.3
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In Monitor.checkLeases(), the monitor thread does not own the LeaseManager lock but it may modify the leases.

      1. CMEinLeaseRecovery-0-18.patch
        1 kB
        Konstantin Shvachko
      2. 4951_20081229c_0.19.patch
        3 kB
        Tsz Wo Nicholas Sze
      3. 4951_20081229c_0.18.patch
        3 kB
        Tsz Wo Nicholas Sze
      4. 4951_20081229c.patch
        2 kB
        Tsz Wo Nicholas Sze
      5. 4951_20081229b.patch
        0.8 kB
        Tsz Wo Nicholas Sze
      6. 4951_20081229.patch
        0.9 kB
        Tsz Wo Nicholas Sze

        Issue Links

          Activity

          Hide
          Tsz Wo Nicholas Sze added a comment -

          4951_20081229.patch: acquire the LeaseManager lock but not the Monitor lock

          Show
          Tsz Wo Nicholas Sze added a comment - 4951_20081229.patch: acquire the LeaseManager lock but not the Monitor lock
          Hide
          Tsz Wo Nicholas Sze added a comment -

          4951_20081229b.patch: acquire the lock in checkLeases()

          Show
          Tsz Wo Nicholas Sze added a comment - 4951_20081229b.patch: acquire the lock in checkLeases()
          Hide
          Raghu Angadi added a comment -

          Fix looks fine. I think it is better to properly indent the code even if it means a bigger patch.

          Show
          Raghu Angadi added a comment - Fix looks fine. I think it is better to properly indent the code even if it means a bigger patch.
          Hide
          Raghu Angadi added a comment -

          or you could move checkLeases() to LeaseMonitor class (where it actually belongs).

          Show
          Raghu Angadi added a comment - or you could move checkLeases() to LeaseMonitor class (where it actually belongs).
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > or you could move checkLeases() to LeaseMonitor class (where it actually belongs).

          You are right. I should move it out.

          4951_20081229c.patch: moved checkLeases() and synchronized toString().

          Show
          Tsz Wo Nicholas Sze added a comment - > or you could move checkLeases() to LeaseMonitor class (where it actually belongs). You are right. I should move it out. 4951_20081229c.patch: moved checkLeases() and synchronized toString().
          Hide
          Raghu Angadi added a comment -

          +1. Looks good.

          Show
          Raghu Angadi added a comment - +1. Looks good.
          Hide
          Tsz Wo Nicholas Sze added a comment -
               [exec] -1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
               [exec]                         Please justify why no tests are needed for this patch.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          

          No new tests added since I don't know how to create a meaningful test for this.

          Show
          Tsz Wo Nicholas Sze added a comment - [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. No new tests added since I don't know how to create a meaningful test for this.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          4951_20081229c_0.18.patch: for 0.18

          Show
          Tsz Wo Nicholas Sze added a comment - 4951_20081229c_0.18.patch: for 0.18
          Hide
          Tsz Wo Nicholas Sze added a comment -

          4951_20081229c_0.19.patch: for 0.19

          Show
          Tsz Wo Nicholas Sze added a comment - 4951_20081229c_0.19.patch: for 0.19
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Tested locally. Only TestMapReduceLocal failed but it is not related to this. See HADOOP-4907.

          Show
          Tsz Wo Nicholas Sze added a comment - Tested locally. Only TestMapReduceLocal failed but it is not related to this. See HADOOP-4907 .
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I just committed this.

          Show
          Tsz Wo Nicholas Sze added a comment - I just committed this.
          Hide
          Koji Noguchi added a comment -

          Nicholas, could you attach the stack trace you observed?
          And what is the effect of this bug?

          Show
          Koji Noguchi added a comment - Nicholas, could you attach the stack trace you observed? And what is the effect of this bug?
          Hide
          Konstantin Shvachko added a comment -

          This is the exception and the stack trace.

          Exception in thread "org.apache.hadoop.dfs.LeaseManager$Monitor@3dbe8711"
          java.util.ConcurrentModificationException
                  at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
                  at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154)
                  at org.apache.hadoop.dfs.LeaseManager$Monitor.checkLeases(LeaseManager.java:370)
                  at org.apache.hadoop.dfs.LeaseManager$Monitor.run(LeaseManager.java:346)
                  at java.lang.Thread.run(Thread.java:619)
          

          Another problem with this issue is that the name-node was running without lease monitor as if nothing happened. As a result, a lot of abandoned files have not been garbage collected.
          This patch fixes the ConcurrentModificationException but does not fix the problem of NN running without the lease monitor. I'll file another jira for that.

          Show
          Konstantin Shvachko added a comment - This is the exception and the stack trace. Exception in thread "org.apache.hadoop.dfs.LeaseManager$Monitor@3dbe8711" java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154) at org.apache.hadoop.dfs.LeaseManager$Monitor.checkLeases(LeaseManager.java:370) at org.apache.hadoop.dfs.LeaseManager$Monitor.run(LeaseManager.java:346) at java.lang. Thread .run( Thread .java:619) Another problem with this issue is that the name-node was running without lease monitor as if nothing happened. As a result, a lot of abandoned files have not been garbage collected. This patch fixes the ConcurrentModificationException but does not fix the problem of NN running without the lease monitor. I'll file another jira for that.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > And what is the effect of this bug?

          The LeaseManager lock is used to protect the lease related data structures. So the bug leads to a race condition on the leases.

          Show
          Tsz Wo Nicholas Sze added a comment - > And what is the effect of this bug? The LeaseManager lock is used to protect the lease related data structures. So the bug leads to a race condition on the leases.
          Hide
          Konstantin Shvachko added a comment -

          This did not fix the ConcurrentModificationException. Although it corrected the synchronization problem.
          The exception now looks as

          Exception in thread "org.apache.hadoop.dfs.LeaseManager$Monitor@2244d990" java.util.ConcurrentModificationException
                  at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
                  at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154)
                  at org.apache.hadoop.dfs.LeaseManager.checkLeases(LeaseManager.java:371)
                  at org.apache.hadoop.dfs.LeaseManager.access$800(LeaseManager.java:51)
                  at org.apache.hadoop.dfs.LeaseManager$Monitor.run(LeaseManager.java:346)
                  at java.lang.Thread.run(Thread.java:619)
          

          Looks like the problem is that internalReleaseLease() finalizes empty files, which removes these file path names from the lease. So this modifies the Collection of file names which we are iterating on.
          An additional patch need to be supplied.

          Show
          Konstantin Shvachko added a comment - This did not fix the ConcurrentModificationException. Although it corrected the synchronization problem. The exception now looks as Exception in thread "org.apache.hadoop.dfs.LeaseManager$Monitor@2244d990" java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154) at org.apache.hadoop.dfs.LeaseManager.checkLeases(LeaseManager.java:371) at org.apache.hadoop.dfs.LeaseManager.access$800(LeaseManager.java:51) at org.apache.hadoop.dfs.LeaseManager$Monitor.run(LeaseManager.java:346) at java.lang. Thread .run( Thread .java:619) Looks like the problem is that internalReleaseLease() finalizes empty files, which removes these file path names from the lease. So this modifies the Collection of file names which we are iterating on. An additional patch need to be supplied.
          Hide
          Konstantin Shvachko added a comment -

          It will be easier to track changes if I open a new issue. Closing this one.

          Show
          Konstantin Shvachko added a comment - It will be easier to track changes if I open a new issue. Closing this one.

            People

            • Assignee:
              Tsz Wo Nicholas Sze
              Reporter:
              Tsz Wo Nicholas Sze
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development