Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-4479

logSync() with the FSNamesystem lock held in commitBlockSynchronization

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.0
    • Fix Version/s: 1.2.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In FSNamesystem#commitBlockSynchronization of branch-1, logSync() may be called when the FSNamesystem lock is held. Similar to HDFS-4186, this may cause some performance issue.

      The following issue was observed in a cluster that was running a Hive job and was writing to 100,000 temporary files (each task is writing to 1000s of files). When this job is killed, a large number of files are left open for write. Eventually when the lease for open files expires, lease recovery is started for all these files in a very short duration of time. This causes a large number of commitBlockSynchronization where logSync is performed with the FSNamesystem lock held. This overloads the namenode resulting in slowdown.

      Since logSync is called right after the synchronization section, we can simply remove the logSync call.

      1. HDFS-4479.b1.002.patch
        2 kB
        Jing Zhao
      2. HDFS-4479.b1.001.patch
        0.6 kB
        Jing Zhao

        Issue Links

          Activity

          Jing Zhao created issue -
          Jing Zhao made changes -
          Field Original Value New Value
          Affects Version/s 1.2.0 [ 12321657 ]
          Hide
          Jing Zhao added a comment -

          Patch uploaded.

          Show
          Jing Zhao added a comment - Patch uploaded.
          Jing Zhao made changes -
          Attachment HDFS-4479.b1.001.patch [ 12568518 ]
          Hide
          Jing Zhao added a comment -

          The simple change passed all the unit tests locally. The test-patch result:

               [exec] -1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
               [exec]                         Please justify why no tests are needed for this patch.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     -1 findbugs.  The patch appears to introduce 232 new Findbugs (version 2.0.1) warnings.
          
          Show
          Jing Zhao added a comment - The simple change passed all the unit tests locally. The test-patch result: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 232 new Findbugs (version 2.0.1) warnings.
          Hide
          Suresh Srinivas added a comment -

          Jing, after the place where logSync is removed, there is a return. Should we do logSync only once in possibly try finally?

          Show
          Suresh Srinivas added a comment - Jing, after the place where logSync is removed, there is a return. Should we do logSync only once in possibly try finally?
          Hide
          Jing Zhao added a comment -

          Thanks for the comments Suresh! Update the patch based on your comments.

          Show
          Jing Zhao added a comment - Thanks for the comments Suresh! Update the patch based on your comments.
          Jing Zhao made changes -
          Attachment HDFS-4479.b1.002.patch [ 12570345 ]
          Hide
          Suresh Srinivas added a comment -

          +1 for the patch. Thanks for not indenting the try block to make the review easy. I am going to indent it before committing the code.

          Show
          Suresh Srinivas added a comment - +1 for the patch. Thanks for not indenting the try block to make the review easy. I am going to indent it before committing the code.
          Suresh Srinivas made changes -
          Summary In branch-1, logSync() may be called with the FSNamesystem lock held in commitBlockSynchronization logSync() with the FSNamesystem lock held in commitBlockSynchronization
          Hide
          Suresh Srinivas added a comment -

          I committed the patch to branch-1. Thank you Jing.

          Show
          Suresh Srinivas added a comment - I committed the patch to branch-1. Thank you Jing.
          Suresh Srinivas made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 1.2.0 [ 12321657 ]
          Resolution Fixed [ 1 ]
          Suresh Srinivas made changes -
          Link This issue is related to HDFS-4186 [ HDFS-4186 ]
          Suresh Srinivas made changes -
          Description In FSNamesystem#commitBlockSynchronization of branch-1, logSync() may be called when the FSNamesystem lock is held. Similar with HDFS-4186, this may cause some performance issue.

          Since logSync is called right after the synchronization section, we can simply remove the logSync call.
          In FSNamesystem#commitBlockSynchronization of branch-1, logSync() may be called when the FSNamesystem lock is held. Similar to HDFS-4186, this may cause some performance issue.

          The following issue was observed in a cluster that was running a Hive job and was writing to 100,000 temporary files (each task is writing to 1000s of files). When this job is killed, a large number of files are left open for write. Eventually when the lease for open files expires, lease recovery is started for all these files in a very short duration of time. This causes a large number of commitBlockSynchronization where logSync is performed with the FSNamesystem lock held. This overloads the namenode resulting in slowdown.

          Since logSync is called right after the synchronization section, we can simply remove the logSync call.
          Hide
          Matt Foley added a comment -

          Closed upon release of Hadoop 1.2.0.

          Show
          Matt Foley added a comment - Closed upon release of Hadoop 1.2.0.
          Matt Foley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          16d 19h 36m 1 Suresh Srinivas 24/Feb/13 21:56
          Resolved Resolved Closed Closed
          79d 7h 19m 1 Matt Foley 15/May/13 06:16

            People

            • Assignee:
              Jing Zhao
              Reporter:
              Jing Zhao
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development