Hadoop Common
  1. Hadoop Common
  2. HADOOP-4951

Lease monitor does not own the LeaseManager lock in changing leases.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.18.2
    • Fix Version/s: 0.18.3
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In Monitor.checkLeases(), the monitor thread does not own the LeaseManager lock but it may modify the leases.

      1. CMEinLeaseRecovery-0-18.patch
        1 kB
        Konstantin Shvachko
      2. 4951_20081229c_0.19.patch
        3 kB
        Tsz Wo Nicholas Sze
      3. 4951_20081229c_0.18.patch
        3 kB
        Tsz Wo Nicholas Sze
      4. 4951_20081229c.patch
        2 kB
        Tsz Wo Nicholas Sze
      5. 4951_20081229b.patch
        0.8 kB
        Tsz Wo Nicholas Sze
      6. 4951_20081229.patch
        0.9 kB
        Tsz Wo Nicholas Sze

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          6h 49m 1 Tsz Wo Nicholas Sze 30/Dec/08 02:57
          Resolved Resolved Reopened Reopened
          23h 55m 1 Konstantin Shvachko 31/Dec/08 02:52
          Reopened Reopened Resolved Resolved
          4m 10s 1 Konstantin Shvachko 31/Dec/08 02:56
          Resolved Resolved Closed Closed
          30d 17h 18m 1 Nigel Daley 30/Jan/09 20:14
          Owen O'Malley made changes -
          Component/s dfs [ 12310710 ]
          Nigel Daley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Tsz Wo Nicholas Sze made changes -
          Link This issue relates to HADOOP-4795 [ HADOOP-4795 ]
          Konstantin Shvachko made changes -
          Link This issue relates to HADOOP-4961 [ HADOOP-4961 ]
          Konstantin Shvachko made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Konstantin Shvachko added a comment -

          It will be easier to track changes if I open a new issue. Closing this one.

          Show
          Konstantin Shvachko added a comment - It will be easier to track changes if I open a new issue. Closing this one.
          Konstantin Shvachko made changes -
          Attachment CMEinLeaseRecovery-0-18.patch [ 12396948 ]
          Konstantin Shvachko made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Konstantin Shvachko added a comment -

          This did not fix the ConcurrentModificationException. Although it corrected the synchronization problem.
          The exception now looks as

          Exception in thread "org.apache.hadoop.dfs.LeaseManager$Monitor@2244d990" java.util.ConcurrentModificationException
                  at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
                  at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154)
                  at org.apache.hadoop.dfs.LeaseManager.checkLeases(LeaseManager.java:371)
                  at org.apache.hadoop.dfs.LeaseManager.access$800(LeaseManager.java:51)
                  at org.apache.hadoop.dfs.LeaseManager$Monitor.run(LeaseManager.java:346)
                  at java.lang.Thread.run(Thread.java:619)
          

          Looks like the problem is that internalReleaseLease() finalizes empty files, which removes these file path names from the lease. So this modifies the Collection of file names which we are iterating on.
          An additional patch need to be supplied.

          Show
          Konstantin Shvachko added a comment - This did not fix the ConcurrentModificationException. Although it corrected the synchronization problem. The exception now looks as Exception in thread "org.apache.hadoop.dfs.LeaseManager$Monitor@2244d990" java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154) at org.apache.hadoop.dfs.LeaseManager.checkLeases(LeaseManager.java:371) at org.apache.hadoop.dfs.LeaseManager.access$800(LeaseManager.java:51) at org.apache.hadoop.dfs.LeaseManager$Monitor.run(LeaseManager.java:346) at java.lang. Thread .run( Thread .java:619) Looks like the problem is that internalReleaseLease() finalizes empty files, which removes these file path names from the lease. So this modifies the Collection of file names which we are iterating on. An additional patch need to be supplied.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > And what is the effect of this bug?

          The LeaseManager lock is used to protect the lease related data structures. So the bug leads to a race condition on the leases.

          Show
          Tsz Wo Nicholas Sze added a comment - > And what is the effect of this bug? The LeaseManager lock is used to protect the lease related data structures. So the bug leads to a race condition on the leases.
          Hide
          Konstantin Shvachko added a comment -

          This is the exception and the stack trace.

          Exception in thread "org.apache.hadoop.dfs.LeaseManager$Monitor@3dbe8711"
          java.util.ConcurrentModificationException
                  at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
                  at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154)
                  at org.apache.hadoop.dfs.LeaseManager$Monitor.checkLeases(LeaseManager.java:370)
                  at org.apache.hadoop.dfs.LeaseManager$Monitor.run(LeaseManager.java:346)
                  at java.lang.Thread.run(Thread.java:619)
          

          Another problem with this issue is that the name-node was running without lease monitor as if nothing happened. As a result, a lot of abandoned files have not been garbage collected.
          This patch fixes the ConcurrentModificationException but does not fix the problem of NN running without the lease monitor. I'll file another jira for that.

          Show
          Konstantin Shvachko added a comment - This is the exception and the stack trace. Exception in thread "org.apache.hadoop.dfs.LeaseManager$Monitor@3dbe8711" java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154) at org.apache.hadoop.dfs.LeaseManager$Monitor.checkLeases(LeaseManager.java:370) at org.apache.hadoop.dfs.LeaseManager$Monitor.run(LeaseManager.java:346) at java.lang. Thread .run( Thread .java:619) Another problem with this issue is that the name-node was running without lease monitor as if nothing happened. As a result, a lot of abandoned files have not been garbage collected. This patch fixes the ConcurrentModificationException but does not fix the problem of NN running without the lease monitor. I'll file another jira for that.
          Hide
          Koji Noguchi added a comment -

          Nicholas, could you attach the stack trace you observed?
          And what is the effect of this bug?

          Show
          Koji Noguchi added a comment - Nicholas, could you attach the stack trace you observed? And what is the effect of this bug?
          Tsz Wo Nicholas Sze made changes -
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 0.18.3 [ 12313494 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I just committed this.

          Show
          Tsz Wo Nicholas Sze added a comment - I just committed this.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Tested locally. Only TestMapReduceLocal failed but it is not related to this. See HADOOP-4907.

          Show
          Tsz Wo Nicholas Sze added a comment - Tested locally. Only TestMapReduceLocal failed but it is not related to this. See HADOOP-4907 .
          Tsz Wo Nicholas Sze made changes -
          Attachment 4951_20081229c_0.19.patch [ 12396878 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          4951_20081229c_0.19.patch: for 0.19

          Show
          Tsz Wo Nicholas Sze added a comment - 4951_20081229c_0.19.patch: for 0.19
          Tsz Wo Nicholas Sze made changes -
          Attachment 4951_20081229c_0.18.patch [ 12396875 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          4951_20081229c_0.18.patch: for 0.18

          Show
          Tsz Wo Nicholas Sze added a comment - 4951_20081229c_0.18.patch: for 0.18
          Tsz Wo Nicholas Sze made changes -
          Priority Major [ 3 ] Blocker [ 1 ]
          Hadoop Flags [Reviewed]
          Hide
          Tsz Wo Nicholas Sze added a comment -
               [exec] -1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
               [exec]                         Please justify why no tests are needed for this patch.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          

          No new tests added since I don't know how to create a meaningful test for this.

          Show
          Tsz Wo Nicholas Sze added a comment - [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. No new tests added since I don't know how to create a meaningful test for this.
          Hide
          Raghu Angadi added a comment -

          +1. Looks good.

          Show
          Raghu Angadi added a comment - +1. Looks good.
          Tsz Wo Nicholas Sze made changes -
          Attachment 4951_20081229c.patch [ 12396872 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > or you could move checkLeases() to LeaseMonitor class (where it actually belongs).

          You are right. I should move it out.

          4951_20081229c.patch: moved checkLeases() and synchronized toString().

          Show
          Tsz Wo Nicholas Sze added a comment - > or you could move checkLeases() to LeaseMonitor class (where it actually belongs). You are right. I should move it out. 4951_20081229c.patch: moved checkLeases() and synchronized toString().
          Hide
          Raghu Angadi added a comment -

          or you could move checkLeases() to LeaseMonitor class (where it actually belongs).

          Show
          Raghu Angadi added a comment - or you could move checkLeases() to LeaseMonitor class (where it actually belongs).
          Hide
          Raghu Angadi added a comment -

          Fix looks fine. I think it is better to properly indent the code even if it means a bigger patch.

          Show
          Raghu Angadi added a comment - Fix looks fine. I think it is better to properly indent the code even if it means a bigger patch.
          Tsz Wo Nicholas Sze made changes -
          Attachment 4951_20081229b.patch [ 12396871 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          4951_20081229b.patch: acquire the lock in checkLeases()

          Show
          Tsz Wo Nicholas Sze added a comment - 4951_20081229b.patch: acquire the lock in checkLeases()
          Tsz Wo Nicholas Sze made changes -
          Field Original Value New Value
          Attachment 4951_20081229.patch [ 12396858 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          4951_20081229.patch: acquire the LeaseManager lock but not the Monitor lock

          Show
          Tsz Wo Nicholas Sze added a comment - 4951_20081229.patch: acquire the LeaseManager lock but not the Monitor lock
          Tsz Wo Nicholas Sze created issue -

            People

            • Assignee:
              Tsz Wo Nicholas Sze
              Reporter:
              Tsz Wo Nicholas Sze
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development