Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1093

Improve namenode scalability by splitting the FSNamesystem synchronized section in a read/write lock

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.0
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Most critical data structures in the NameNode (NN) are protected by a syncronized methods in the FSNamesystem class. This essentially makes critical code paths in the NN single-threaded. However, a large percentage of the NN calls are listStatus, getBlockLocations, etc which do not change internal data structures at all, these are read-only calls. If we change the FSNamesystem lock to a read/write lock, many of the above operations can occur in parallel, thus improving the scalability of the NN.

      1. NNreadwriteLock.txt
        53 kB
        dhruba borthakur
      2. NNreadwriteLock_trunk_1.txt
        60 kB
        dhruba borthakur
      3. NNreadwriteLock_trunk_2.txt
        60 kB
        Suresh Srinivas
      4. NNreadwriteLock_trunk_3.txt
        63 kB
        dhruba borthakur
      5. NNreadwriteLock_trunk_4.txt
        65 kB
        dhruba borthakur
      6. NNreadwriteLock_trunk_5.txt
        64 kB
        dhruba borthakur

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Patch Available Patch Available Open Open
          42d 7h 29m 5 dhruba borthakur 16/Sep/10 17:01
          Open Open Patch Available Patch Available
          115d 2h 54m 6 dhruba borthakur 16/Sep/10 17:02
          Patch Available Patch Available Resolved Resolved
          6h 24m 1 dhruba borthakur 16/Sep/10 23:26
          Resolved Resolved Closed Closed
          451d 7h 52m 1 Konstantin Shvachko 12/Dec/11 06:19
          Konstantin Shvachko made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Suresh Srinivas made changes -
          Link This issue relates to HDFS-1440 [ HDFS-1440 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #396 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/396/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #396 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/396/ )
          dhruba borthakur made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          dhruba borthakur added a comment -

          I just committed this. Thanks Suresh for reviewing this big patch.

          Show
          dhruba borthakur added a comment - I just committed this. Thanks Suresh for reviewing this big patch.
          Hide
          dhruba borthakur added a comment -

          All unit tests pass except the following:

          TestDataNodeVolumeFailure
          TestFileConcurrentReader
          TestBlockRecovery
          TestFileStatus

          I am going to commit this patch today evening (unless anybody has any concerns)

          Show
          dhruba borthakur added a comment - All unit tests pass except the following: TestDataNodeVolumeFailure TestFileConcurrentReader TestBlockRecovery TestFileStatus I am going to commit this patch today evening (unless anybody has any concerns)
          Hide
          dhruba borthakur added a comment -

          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
          [exec] Please justify why no new tests are needed for this patch.
          [exec] Also please list what manual steps were performed to verify this patch.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          [exec]
          [exec] +1 system tests framework. The patch passed system tests framework compile.

          Show
          dhruba borthakur added a comment - [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system tests framework. The patch passed system tests framework compile.
          dhruba borthakur made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          dhruba borthakur made changes -
          Attachment NNreadwriteLock_trunk_5.txt [ 12454771 ]
          Hide
          dhruba borthakur added a comment -

          Merged patch with latest trunk.

          Show
          dhruba borthakur added a comment - Merged patch with latest trunk.
          dhruba borthakur made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          dhruba borthakur added a comment -

          Have to merge with latest trunk

          Show
          dhruba borthakur added a comment - Have to merge with latest trunk
          Hide
          Suresh Srinivas added a comment -

          +1 for the patch. Looks like patch needs to be updated for the latest trunk.

          Show
          Suresh Srinivas added a comment - +1 for the patch. Looks like patch needs to be updated for the latest trunk.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12454396/NNreadwriteLock_trunk_4.txt
          against trunk revision 997157.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/240/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12454396/NNreadwriteLock_trunk_4.txt against trunk revision 997157. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/240/console This message is automatically generated.
          dhruba borthakur made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Fix Version/s 0.22.0 [ 12314241 ]
          dhruba borthakur made changes -
          Attachment NNreadwriteLock_trunk_4.txt [ 12454396 ]
          Hide
          dhruba borthakur added a comment -

          I implemented suresh's suggestion regarding FSPermissionChecker.checkPermission.

          Regarding the readLock on getBlockLocationsInternal: access times are precise upto an hour boundary. In many cases, the getBlockLocationsInternal does not have to update the accesstime (even though access time is enabled). That is the reason for the dance to try first with readLock, and if an access time update is required, only then acquire the writeLock.

          Show
          dhruba borthakur added a comment - I implemented suresh's suggestion regarding FSPermissionChecker.checkPermission. Regarding the readLock on getBlockLocationsInternal: access times are precise upto an hour boundary. In many cases, the getBlockLocationsInternal does not have to update the accesstime (even though access time is enabled). That is the reason for the dance to try first with readLock, and if an access time update is required, only then acquire the writeLock.
          Hide
          Suresh Srinivas added a comment -

          I agree with you that there are code portions in processReport, createSymLinkInternal, startFileInternal() that can move outside the FSNamesystem lock. However, I would like to avoid doing this code reorganizatin as part of this JIRA, especially because it makes the code difficult to review. Also, this is not a regression because the original code has all these code inside the synchronized section anyway. Please let me know if you agree on this one.

          I agree that this jira may not be the right place for this optimization. Compared to earlier code with synchronized methods, with this change, we can optimize the length of synchronized sections. We can create a bug to track this optimization.

          getBlockLocations() - I now acquire the readLock and attempt to proceed ahead. If the access-time has to be set, then I release the readLock, acquire the writeLock and start all over again

          Why not check for doAccessTime, if true grab writeLock else readLock? Make doAccessTime parameter final. This change seems much simpler - no need to repeat the initial steps such as looking for inode, computing now etc.

          removeStoredBlock - assert to be replaced by grab writeLock: I have the impression that all calls to removeStoredBlock already has the writeLock, that is the reason for the assert. Do you know of a code path via which this is not the case?

          I agree this method is not called without holding the writeLock. However assert is not turned on during run time. Also this code changes the previous semantics.

          Do you have a suggestion on how I can fix the code in FSPermissionChecker.checkPermission()?

          This method is called only from FSNamesystem. How about grabbing the readLock and then calling checkPermission without passing the root INodeDirectory?

          Show
          Suresh Srinivas added a comment - I agree with you that there are code portions in processReport, createSymLinkInternal, startFileInternal() that can move outside the FSNamesystem lock. However, I would like to avoid doing this code reorganizatin as part of this JIRA, especially because it makes the code difficult to review. Also, this is not a regression because the original code has all these code inside the synchronized section anyway. Please let me know if you agree on this one. I agree that this jira may not be the right place for this optimization. Compared to earlier code with synchronized methods, with this change, we can optimize the length of synchronized sections. We can create a bug to track this optimization. getBlockLocations() - I now acquire the readLock and attempt to proceed ahead. If the access-time has to be set, then I release the readLock, acquire the writeLock and start all over again Why not check for doAccessTime, if true grab writeLock else readLock? Make doAccessTime parameter final. This change seems much simpler - no need to repeat the initial steps such as looking for inode, computing now etc. removeStoredBlock - assert to be replaced by grab writeLock: I have the impression that all calls to removeStoredBlock already has the writeLock, that is the reason for the assert. Do you know of a code path via which this is not the case? I agree this method is not called without holding the writeLock. However assert is not turned on during run time. Also this code changes the previous semantics. Do you have a suggestion on how I can fix the code in FSPermissionChecker.checkPermission()? This method is called only from FSNamesystem. How about grabbing the readLock and then calling checkPermission without passing the root INodeDirectory?
          dhruba borthakur made changes -
          Attachment NNreadwriteLock_trunk_3.txt [ 12449221 ]
          Hide
          dhruba borthakur added a comment -

          I have purposely kept the try-catch block un-indented so that it is easier for you to review. Once the review is complete, I will upload a final patch with proper indentation.

          FSNamesystem.java
          -------------------------
          I removed the import of the ReadWriteLock. However, I kept the fsLock initializing inside the initialize method rather than moving it to the declaration. Most variables seem to be getting initialized inside the initialize() method, isn't it?

          getBlockLocations() - I now acquire the readLock and attempt to proceed ahead. If the access-time has to be set, then I release the readLock, acquire the writeLock and start all over again.
          Check for safemode should be inside the FSnamesystem lock, in fact more of it is coming in HDFS-988.
          setTimes() move isAccessTimeSupported() call out side the lock : done.

          CommitBlockSynchronization(): the key is to call the logSync() outside the FSnamesystem lock. And there is a LOG statement that prints out "src" after the call to logSync(). That is the reason why i had to declare "String src" right at the beginning of the method.

          I agree with you that there are code portions in processReport, createSymLinkInternal, startFileInternal() that can move outside the FSNamesystem lock. However, I would like to avoid doing this code reorganizatin as part of this JIRA, especially because it makes the code difficult to review. Also, this is not a regression because the original code has all these code inside the synchronized section anyway. Please let me know if you agree on this one.
          getDatanodeListForReport - move boolean and HashMap out of read lock: done.
          getSafeModeTip - readLock instead of writeLock : done

          BlockManager.java
          ------------------------
          chooseUnderReplicatedBlocks - should we grab readLock instead of writeLock? I wold prefer to keep the writeLock because this method updates member variables of the class, e.g. missingBlocksInPrevIter

          removeStoredBlock - assert to be replaced by grab writeLock: I have the impression that all calls to removeStoredBlock already has the writeLock, that is the reason for the assert. Do you know of a code path via which this is not the case?

          DecommissionManager.java - readLock instead of writeLock; The DecommissionManager.check() calls FSNamesystem.checkDecommissionStateInterna() which, in turn, invokes node.setDecommissioned(). This updates the state of the node, hence I was confortable keeping the writeLock in this code path.

          FSDirectory.java
          ---------------------
          BackupStorage.loadCheckPoint() is now fixed.
          Do you have a suggestion on how I can fix the code in FSPermissionChecker.checkPermission()?

          Show
          dhruba borthakur added a comment - I have purposely kept the try-catch block un-indented so that it is easier for you to review. Once the review is complete, I will upload a final patch with proper indentation. FSNamesystem.java ------------------------- I removed the import of the ReadWriteLock. However, I kept the fsLock initializing inside the initialize method rather than moving it to the declaration. Most variables seem to be getting initialized inside the initialize() method, isn't it? getBlockLocations() - I now acquire the readLock and attempt to proceed ahead. If the access-time has to be set, then I release the readLock, acquire the writeLock and start all over again. Check for safemode should be inside the FSnamesystem lock, in fact more of it is coming in HDFS-988 . setTimes() move isAccessTimeSupported() call out side the lock : done. CommitBlockSynchronization(): the key is to call the logSync() outside the FSnamesystem lock. And there is a LOG statement that prints out "src" after the call to logSync(). That is the reason why i had to declare "String src" right at the beginning of the method. I agree with you that there are code portions in processReport, createSymLinkInternal, startFileInternal() that can move outside the FSNamesystem lock. However, I would like to avoid doing this code reorganizatin as part of this JIRA, especially because it makes the code difficult to review. Also, this is not a regression because the original code has all these code inside the synchronized section anyway. Please let me know if you agree on this one. getDatanodeListForReport - move boolean and HashMap out of read lock: done. getSafeModeTip - readLock instead of writeLock : done BlockManager.java ------------------------ chooseUnderReplicatedBlocks - should we grab readLock instead of writeLock? I wold prefer to keep the writeLock because this method updates member variables of the class, e.g. missingBlocksInPrevIter removeStoredBlock - assert to be replaced by grab writeLock: I have the impression that all calls to removeStoredBlock already has the writeLock, that is the reason for the assert. Do you know of a code path via which this is not the case? DecommissionManager.java - readLock instead of writeLock; The DecommissionManager.check() calls FSNamesystem.checkDecommissionStateInterna() which, in turn, invokes node.setDecommissioned(). This updates the state of the node, hence I was confortable keeping the writeLock in this code path. FSDirectory.java --------------------- BackupStorage.loadCheckPoint() is now fixed. Do you have a suggestion on how I can fix the code in FSPermissionChecker.checkPermission()?
          dhruba borthakur made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          dhruba borthakur added a comment -

          Incorporate review comments.

          Show
          dhruba borthakur added a comment - Incorporate review comments.
          Hide
          Suresh Srinivas added a comment -

          Comments:

          1. General - try-catch block is not indented right in many places. This seems like a good idea to keep the code changes to minimum. Is it a good idea to leave this code as it is, which violates the coding standard? Should we do formatting in another jira or this jira?
          2. FSNamesystem.java
            • ReadWriteLock import not used
            • make fsLock final and initialize it in declaration?
            • getBlockLocations() - should we aquire writeLock if doAccessTime is true and readLock otherwise?
            • Move check for safemode outside the lock for all the methods
            • setTimes() move isAccessTimeSupported() call out side the lock?
            • CommitBlockSynchronization() - minor: declare String src where it is needed.
            • ProcessReport - move declaring now() and first log outside synchronized section. Similar pattern of code that can moved outside lock in methods createSymLinkInternal, startFileInternal, abandonBlock, completeFileInternal, renameToInternal, mkdirsInternal, commitBlockSynchronization, registerDatanode, startCheckpoint, endCheckpoint, updatePipeline, getDecommissioningNodes
            • getDatanodeListForReport - move boolean and HashMap out of read lock.
            • getSafeModeTip - readLock instead of writeLock?
          3. BlockManager.java
            • chooseUnderReplicatedBlocks - should we grab readLock instead of writeLock?
            • removeStoredBlock - assert to be replaced by grab writeLock
            • invalidateWorkForOneNode - safemode check outside the lock
          4. DecommissionManager.java - readLock instead of writeLock?
          5. FSDirectory.java
            • ReadWriteLock import not used
            • bLock and cond must be final variables and initialize in declaration?
            • The new lock introduced replaces previous synchronization on FSDirectory object and FSDirectory.root object
            • FSDirectory.root synchronization is still used in BackupStorage.loadCheckPoint()
            • FSPermissionChecker.checkPermission is synchronized on INodeDirectory, which happens to be FSDirectory.root. This is an ugly code to change to use the lock given the method could get any other INodeDirectory.
          Show
          Suresh Srinivas added a comment - Comments: General - try-catch block is not indented right in many places. This seems like a good idea to keep the code changes to minimum. Is it a good idea to leave this code as it is, which violates the coding standard? Should we do formatting in another jira or this jira? FSNamesystem.java ReadWriteLock import not used make fsLock final and initialize it in declaration? getBlockLocations() - should we aquire writeLock if doAccessTime is true and readLock otherwise? Move check for safemode outside the lock for all the methods setTimes() move isAccessTimeSupported() call out side the lock? CommitBlockSynchronization() - minor: declare String src where it is needed. ProcessReport - move declaring now() and first log outside synchronized section. Similar pattern of code that can moved outside lock in methods createSymLinkInternal, startFileInternal, abandonBlock, completeFileInternal, renameToInternal, mkdirsInternal, commitBlockSynchronization, registerDatanode, startCheckpoint, endCheckpoint, updatePipeline, getDecommissioningNodes getDatanodeListForReport - move boolean and HashMap out of read lock. getSafeModeTip - readLock instead of writeLock? BlockManager.java chooseUnderReplicatedBlocks - should we grab readLock instead of writeLock? removeStoredBlock - assert to be replaced by grab writeLock invalidateWorkForOneNode - safemode check outside the lock DecommissionManager.java - readLock instead of writeLock? FSDirectory.java ReadWriteLock import not used bLock and cond must be final variables and initialize in declaration? The new lock introduced replaces previous synchronization on FSDirectory object and FSDirectory.root object FSDirectory.root synchronization is still used in BackupStorage.loadCheckPoint() FSPermissionChecker.checkPermission is synchronized on INodeDirectory, which happens to be FSDirectory.root. This is an ugly code to change to use the lock given the method could get any other INodeDirectory.
          Hide
          dhruba borthakur added a comment -

          The latest patch looks good, thanks Suresh for merging it with trunk.

          Show
          dhruba borthakur added a comment - The latest patch looks good, thanks Suresh for merging it with trunk.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12448598/NNreadwriteLock_trunk_2.txt
          against trunk revision 959874.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/422/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/422/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/422/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/422/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448598/NNreadwriteLock_trunk_2.txt against trunk revision 959874. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/422/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/422/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/422/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/422/console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12448598/NNreadwriteLock_trunk_2.txt
          against trunk revision 959874.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/209/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/209/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/209/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/209/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448598/NNreadwriteLock_trunk_2.txt against trunk revision 959874. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/209/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/209/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/209/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/209/console This message is automatically generated.
          Suresh Srinivas made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Suresh Srinivas made changes -
          Attachment NNreadwriteLock_trunk_2.txt [ 12448598 ]
          Hide
          Suresh Srinivas added a comment -

          Attaching a new merged patch. I had to merge it, before I could code review. Dhruba see if I missed any thing. I will post comments soon.

          Show
          Suresh Srinivas added a comment - Attaching a new merged patch. I had to merge it, before I could code review. Dhruba see if I missed any thing. I will post comments soon.
          Suresh Srinivas made changes -
          Attachment NNreadwriteLock_trunk_2.txt [ 12448597 ]
          Suresh Srinivas made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Suresh Srinivas made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Suresh Srinivas made changes -
          Attachment NNreadwriteLock_trunk_2.txt [ 12448597 ]
          Hide
          Suresh Srinivas added a comment -

          Attaching a new merged patch. I had to merge it, before I could code review. Dhruba see if I missed any thing. I will post comments soon.

          Show
          Suresh Srinivas added a comment - Attaching a new merged patch. I had to merge it, before I could code review. Dhruba see if I missed any thing. I will post comments soon.
          Suresh Srinivas made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12446334/NNreadwriteLock_trunk_1.txt
          against trunk revision 959874.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/420/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12446334/NNreadwriteLock_trunk_1.txt against trunk revision 959874. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/420/console This message is automatically generated.
          Steve Loughran made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Steve Loughran made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Steve Loughran added a comment -

          cancelling to rerun the patch and see the test results

          Show
          Steve Loughran added a comment - cancelling to rerun the patch and see the test results
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12446334/NNreadwriteLock_trunk_1.txt
          against trunk revision 951178.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/395/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/395/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/395/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/395/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12446334/NNreadwriteLock_trunk_1.txt against trunk revision 951178. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/395/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/395/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/395/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/395/console This message is automatically generated.
          dhruba borthakur made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          dhruba borthakur made changes -
          Attachment NNreadwriteLock_trunk_1.txt [ 12446334 ]
          Hide
          dhruba borthakur added a comment -

          This patch changes the FSNamesystem global lock to a reader/write lock. This patch is for HDFS trunk. The code change is simple, but the impact is huge! I would like somebody to review this patch, but I will also post performance numbers on it in about a week.

          Show
          dhruba borthakur added a comment - This patch changes the FSNamesystem global lock to a reader/write lock. This patch is for HDFS trunk. The code change is simple, but the impact is huge! I would like somebody to review this patch, but I will also post performance numbers on it in about a week.
          Hide
          dhruba borthakur added a comment -

          The numbers I listed were purely from the RPC module. In that case, each createfile rpc could have created multiple components in the specified path. similarly, a delete cal could have been recursive and could have deleted multiple paths recursively.

          Show
          dhruba borthakur added a comment - The numbers I listed were purely from the RPC module. In that case, each createfile rpc could have created multiple components in the specified path. similarly, a delete cal could have been recursive and could have deleted multiple paths recursively.
          Hide
          Konstantin Shvachko added a comment -

          createfiles 3%, deletefiles 1%. Oh, that's why the clusters grow .
          Does 1% for deletes counts directories and files? If people delete large directories then may be there is more than 1% of file deletes.

          Show
          Konstantin Shvachko added a comment - createfiles 3%, deletefiles 1%. Oh, that's why the clusters grow . Does 1% for deletes counts directories and files? If people delete large directories then may be there is more than 1% of file deletes.
          Hide
          dhruba borthakur added a comment -

          A typical analysis of the workload in our cluster shows that the following types of RPC are processed by the namenode:

          listStatus 47%
          openfiles 42%
          createfiles 3%
          mkdirs 3%
          rename 2%
          deletefiles 1%

          Show
          dhruba borthakur added a comment - A typical analysis of the workload in our cluster shows that the following types of RPC are processed by the namenode: listStatus 47% openfiles 42% createfiles 3% mkdirs 3% rename 2% deletefiles 1%
          dhruba borthakur made changes -
          Attachment NNreadwriteLock.txt [ 12442824 ]
          Hide
          dhruba borthakur added a comment -

          Patch for hadoop 0.20.

          Show
          dhruba borthakur added a comment - Patch for hadoop 0.20.
          dhruba borthakur made changes -
          Assignee Dmytro Molkov [ dms ] dhruba borthakur [ dhruba ]
          Hide
          Todd Lipcon added a comment -

          Looks like we actually already filed this - HDFS-980 - we should close one or the other as dup

          Show
          Todd Lipcon added a comment - Looks like we actually already filed this - HDFS-980 - we should close one or the other as dup
          Hide
          Todd Lipcon added a comment -

          +1 on this idea. I also have ReentrantReadWriteLock support somewhat working in jcarder now, so we can still detect deadlocks.

          Show
          Todd Lipcon added a comment - +1 on this idea. I also have ReentrantReadWriteLock support somewhat working in jcarder now, so we can still detect deadlocks.
          dhruba borthakur made changes -
          Field Original Value New Value
          Assignee dhruba borthakur [ dhruba ] Dmytro Molkov [ dms ]
          Component/s name-node [ 12312926 ]
          dhruba borthakur created issue -

            People

            • Assignee:
              dhruba borthakur
              Reporter:
              dhruba borthakur
            • Votes:
              1 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development