Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1840

Terminate LeaseChecker when all writing files are closed.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.23.0
    • Component/s: hdfs-client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In DFSClient, when there are files opened for write, a LeaseChecker thread is started for updating the leases periodically. However, it never terminates when when all writing files are closed.

      1. h1840_20110418.patch
        8 kB
        Tsz Wo Nicholas Sze
      2. h1840_20110419.patch
        13 kB
        Tsz Wo Nicholas Sze
      3. h1840_20110419b.patch
        12 kB
        Tsz Wo Nicholas Sze

        Issue Links

          Activity

          Hide
          Tsz Wo Nicholas Sze added a comment -

          h1840_20110418.patch: terminate the existing daemon and start a new daemon when the size of pendingCreates changes from 1 to 0 and from 0 to 1, respectively.

          Show
          Tsz Wo Nicholas Sze added a comment - h1840_20110418.patch: terminate the existing daemon and start a new daemon when the size of pendingCreates changes from 1 to 0 and from 0 to 1, respectively.
          Hide
          Suresh Srinivas added a comment -

          Why is terminating the thread important?

          Show
          Suresh Srinivas added a comment - Why is terminating the thread important?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12476650/h1840_20110418.patch
          against trunk revision 1094092.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.TestFileConcurrentReader

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/380//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/380//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/380//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12476650/h1840_20110418.patch against trunk revision 1094092. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/380//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/380//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/380//console This message is automatically generated.
          Hide
          Suresh Srinivas added a comment -
          1. DFSClient.java
            • hdfsTimeout is initalized after LeaseChecker construction. Hence LeaseChecker#renewal may not be initialized right?
            • In DFSClient#run - the first time lastRenewed could be == renewal and hence lease is may not be renewed the first time
            • currentID is not synchronized correctly. There is a possiblity of threads lingering longer, because of this. You can avoid this id, by interrupting the current Daemon in remove method and shutting down the thread.
            • minor: //no more being written files, terminate. to // no more files being written
          Show
          Suresh Srinivas added a comment - DFSClient.java hdfsTimeout is initalized after LeaseChecker construction. Hence LeaseChecker#renewal may not be initialized right? In DFSClient#run - the first time lastRenewed could be == renewal and hence lease is may not be renewed the first time currentID is not synchronized correctly. There is a possiblity of threads lingering longer, because of this. You can avoid this id, by interrupting the current Daemon in remove method and shutting down the thread. minor: //no more being written files, terminate. to // no more files being written
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Thanks Suresh for the comments.

          > Why is terminating the thread important?
          Otherwise, the number of threads keeps increasing when there are multiple DFSClient. This is particular important for server applications like JobTracker or Oozie.

          > 1. DFSClient.java ...
          I will change everything in my next patch.

          Show
          Tsz Wo Nicholas Sze added a comment - Thanks Suresh for the comments. > Why is terminating the thread important? Otherwise, the number of threads keeps increasing when there are multiple DFSClient . This is particular important for server applications like JobTracker or Oozie. > 1. DFSClient.java ... I will change everything in my next patch.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h1840_20110419.patch:

          • incorporated Suresh's comments
          • added a grace period so that the thread won't have to restart if the user application open and close files frequently.
          Show
          Tsz Wo Nicholas Sze added a comment - h1840_20110419.patch: incorporated Suresh's comments added a grace period so that the thread won't have to restart if the user application open and close files frequently.
          Hide
          Suresh Srinivas added a comment -

          Minor comments:

          1. For test configurability instead of new config option, I prefer using a package private method, with comments "to be used for testing".
          2. Instead of fragmented explanation of how the grace period works, complete description of the timeout functionality either in javadoc of class of run method would be better.
          Show
          Suresh Srinivas added a comment - Minor comments: For test configurability instead of new config option, I prefer using a package private method, with comments "to be used for testing". Instead of fragmented explanation of how the grace period works, complete description of the timeout functionality either in javadoc of class of run method would be better.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h1840_20110419b.patch:

          • added setGraceSleepPeriod(..)
          • added more details in the javadoc of gracePeriod
               [exec] +1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 5 new or modified tests.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.
               [exec] 
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
               [exec] 
               [exec]     +1 system test framework.  The patch passed system test framework compile.
          

          Running unit tests ...

          Show
          Tsz Wo Nicholas Sze added a comment - h1840_20110419b.patch: added setGraceSleepPeriod(..) added more details in the javadoc of gracePeriod [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 5 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. Running unit tests ...
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Passed all unit tests except TestAuthorizationFilter.

          Show
          Tsz Wo Nicholas Sze added a comment - Passed all unit tests except TestAuthorizationFilter .
          Hide
          Suresh Srinivas added a comment -

          +1 for the patch.

          Show
          Suresh Srinivas added a comment - +1 for the patch.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have committed this.

          Show
          Tsz Wo Nicholas Sze added a comment - I have committed this.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/)
          HDFS-1840. In DFSClient, terminate the lease renewing thread when all files being written are closed for a grace period, and start a new thread when new files are opened for write.

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/ ) HDFS-1840 . In DFSClient, terminate the lease renewing thread when all files being written are closed for a grace period, and start a new thread when new files are opened for write.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #600 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/600/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #600 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/600/ )
          Hide
          Cheng Leong added a comment -

          According to the thread dump of my cluster's JobTracker, there were over 40k LeaseChecker threads - I would imagine all the other Hadoop users should have seen this before as well? Shouldn't this cause some sort of performance degradation?

          Show
          Cheng Leong added a comment - According to the thread dump of my cluster's JobTracker, there were over 40k LeaseChecker threads - I would imagine all the other Hadoop users should have seen this before as well? Shouldn't this cause some sort of performance degradation?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > ... Shouldn't this cause some sort of performance degradation?

          I do not understand what you mean. Could you give more details?

          Show
          Tsz Wo Nicholas Sze added a comment - > ... Shouldn't this cause some sort of performance degradation? I do not understand what you mean. Could you give more details?
          Hide
          Cheng Leong added a comment -

          So the number of threads keeps increasing (in my case, 40k+), isn't this bad for JobTracker? I am seeing some performance issue with JobTracker (web console taking 10 seconds to load), but I am not sure if this is related. Thanks.

          Show
          Cheng Leong added a comment - So the number of threads keeps increasing (in my case, 40k+), isn't this bad for JobTracker? I am seeing some performance issue with JobTracker (web console taking 10 seconds to load), but I am not sure if this is related. Thanks.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Before the patch in this issue, you are right that the number of threads increases as more and more users submitting jobs to JobTracker since the threads live forever even if some users stop submitting jobs. This issue is to fix the problem – changed the threads to that they will terminate. In the JobTracker case, the number of threads will decrease when users stop submitting new jobs. So this actually improves the performance.

          Show
          Tsz Wo Nicholas Sze added a comment - Before the patch in this issue, you are right that the number of threads increases as more and more users submitting jobs to JobTracker since the threads live forever even if some users stop submitting jobs. This issue is to fix the problem – changed the threads to that they will terminate. In the JobTracker case, the number of threads will decrease when users stop submitting new jobs. So this actually improves the performance.
          Hide
          Cheng Leong added a comment -

          yeah exactly. I was just surprised that no one else is seeing the same performance issue, otherwise your fix should be prioritized? (0.23 is like a couple years away) Maybe my question should be, have you seen JobTracker's performance being impacted by this bug before?

          Show
          Cheng Leong added a comment - yeah exactly. I was just surprised that no one else is seeing the same performance issue, otherwise your fix should be prioritized? (0.23 is like a couple years away) Maybe my question should be, have you seen JobTracker's performance being impacted by this bug before?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Yes, we have seen this but we do not have 40k+ threads, probably only a few hundreds.

          > ... (0.23 is like a couple years away) ...

          Hopefully, it is only a a couple months away.

          Show
          Tsz Wo Nicholas Sze added a comment - Yes, we have seen this but we do not have 40k+ threads, probably only a few hundreds. > ... (0.23 is like a couple years away) ... Hopefully, it is only a a couple months away.

            People

            • Assignee:
              Tsz Wo Nicholas Sze
              Reporter:
              Tsz Wo Nicholas Sze
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development