Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-4777

File creation with overwrite flag set to true results in logSync holding namesystem lock

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.23.0, 2.0.0-alpha
    • Fix Version/s: None
    • Component/s: namenode
    • Labels:
      None

      Description

      FSNamesystem#startFileInternal calls delete. Delete method releases the write lock, making parts of startFileInternal code unintentionally executed without write lock being held.

      1. HDFS-4777.patch
        8 kB
        Suresh Srinivas

        Issue Links

          Activity

          Hide
          Colin Patrick McCabe added a comment -

          I might be missing something here, but I don't see why "Holding the lock when calling FSNS#logSyncAll when entering safe mode" is necessary. Can't we just move the logSync to after safeMode has been set?

          Conceptually, I don't see why setting safe mode should require logSync to succeed first. Surely we don't want setting safe mode to fail because logSync failed? So we can just do it at the end of enterSafeMode, after releasing the write lock.

          Show
          Colin Patrick McCabe added a comment - I might be missing something here, but I don't see why "Holding the lock when calling FSNS#logSyncAll when entering safe mode" is necessary. Can't we just move the logSync to after safeMode has been set? Conceptually, I don't see why setting safe mode should require logSync to succeed first. Surely we don't want setting safe mode to fail because logSync failed? So we can just do it at the end of enterSafeMode, after releasing the write lock.
          Hide
          Suresh Srinivas added a comment -

          I am going to mark this as Major and follow up after 2.1.0-beta.

          Show
          Suresh Srinivas added a comment - I am going to mark this as Major and follow up after 2.1.0-beta.
          Hide
          Daryn Sharp added a comment -

          Secret manager updating master key does not depend on FSNamesystem lock

          As far as I can tell, no secret manager activity requires the fsn lock. One of the many things overflowing from my plate...

          Show
          Daryn Sharp added a comment - Secret manager updating master key does not depend on FSNamesystem lock As far as I can tell, no secret manager activity requires the fsn lock. One of the many things overflowing from my plate...
          Hide
          Aaron T. Myers added a comment -

          My preference is to not logSync at all with FSNamesystem lock held.

          No disagreement there - just pointing out that it might be a little difficult to do in the case of entering safemode, and pointing out that it seems there's actually relatively few places that need to be addressed in order to get those tests passing.

          Show
          Aaron T. Myers added a comment - My preference is to not logSync at all with FSNamesystem lock held. No disagreement there - just pointing out that it might be a little difficult to do in the case of entering safemode, and pointing out that it seems there's actually relatively few places that need to be addressed in order to get those tests passing.
          Hide
          Suresh Srinivas added a comment -

          Aaron T. Myers My preference is to not logSync at all with FSNamesystem lock held. That way we do not have two mechanisms where it is allowed both with lock held and in some cases lock not held and the first one gets abused. That said, the NN lock mechanism is unpredictable and sometime the code that says NN lock is not needed, gets called with lock held, through complex set of method calls. Unspaghettifying this code is non-trivial. Case in question:

          • Secret manager updating master key does not depend on FSNamesystem lock
          • However same set of methods get called while calling FSNamesystem#startCommonServices() calls same set of methods and it is done with NN lock held. That is the second one you are saying needs to be fixed.
          Show
          Suresh Srinivas added a comment - Aaron T. Myers My preference is to not logSync at all with FSNamesystem lock held. That way we do not have two mechanisms where it is allowed both with lock held and in some cases lock not held and the first one gets abused. That said, the NN lock mechanism is unpredictable and sometime the code that says NN lock is not needed, gets called with lock held, through complex set of method calls. Unspaghettifying this code is non-trivial. Case in question: Secret manager updating master key does not depend on FSNamesystem lock However same set of methods get called while calling FSNamesystem#startCommonServices() calls same set of methods and it is done with NN lock held. That is the second one you are saying needs to be fixed.
          Hide
          Aaron T. Myers added a comment -

          Thanks a lot for working on this, Suresh.

          From a quick look it seems like most (all?) of these test failures fall into three categories:

          1. Holding the lock during sync because of the startFile/deleteInternal issue you initially identified in this JIRA.
          2. Holding the lock when calling FSNS#logUpdateMasterKey when updating token secret keys.
          3. Holding the lock when calling FSNS#logSyncAll when entering safe mode.

          The first two definitely seem like they should be fixed. The third I'm not so sure can/should be fixed, since we're not really concerned about NN operation throughput when entering safemode, and it does seem to be a good idea to call logSyncAll when entering SM.

          Certainly feel free to make sub-tasks if you want, though it doesn't seem crazy to me to do this in one JIRA. I don't think it will actually require changing all that many places.

          Show
          Aaron T. Myers added a comment - Thanks a lot for working on this, Suresh. From a quick look it seems like most (all?) of these test failures fall into three categories: Holding the lock during sync because of the startFile/deleteInternal issue you initially identified in this JIRA. Holding the lock when calling FSNS#logUpdateMasterKey when updating token secret keys. Holding the lock when calling FSNS#logSyncAll when entering safe mode. The first two definitely seem like they should be fixed. The third I'm not so sure can/should be fixed, since we're not really concerned about NN operation throughput when entering safemode, and it does seem to be a good idea to call logSyncAll when entering SM. Certainly feel free to make sub-tasks if you want, though it doesn't seem crazy to me to do this in one JIRA. I don't think it will actually require changing all that many places.
          Hide
          Suresh Srinivas added a comment -

          If we need to fix all these failures in code paths where logSync is called holding FSNamesystem read or write lock, we need to cleanup a lot of code. I do not think it is possible in a single jira. I plan on making this jira an umbrella jira and fix one issue at a time.

          Show
          Suresh Srinivas added a comment - If we need to fix all these failures in code paths where logSync is called holding FSNamesystem read or write lock, we need to cleanup a lot of code. I do not think it is possible in a single jira. I plan on making this jira an umbrella jira and fix one issue at a time.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12581106/HDFS-4777.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.TestParallelImageWrite
          org.apache.hadoop.hdfs.TestSafeMode
          org.apache.hadoop.fs.viewfs.TestViewFileSystemAtHdfsRoot
          org.apache.hadoop.hdfs.TestParallelUnixDomainRead
          org.apache.hadoop.hdfs.TestLease
          org.apache.hadoop.hdfs.TestLeaseRecovery2
          org.apache.hadoop.hdfs.TestLeaseRecovery
          org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
          org.apache.hadoop.fs.TestResolveHdfsSymlink
          org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints
          org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
          org.apache.hadoop.hdfs.server.namenode.TestCheckPointForSecurityTokens
          org.apache.hadoop.hdfs.server.namenode.TestNameNodeResourceChecker
          org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode
          org.apache.hadoop.hdfs.TestDFSShell
          org.apache.hadoop.hdfs.TestFileCreation
          org.apache.hadoop.hdfs.TestDFSFinalize
          org.apache.hadoop.hdfs.TestDFSStartupVersions
          org.apache.hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer
          org.apache.hadoop.hdfs.TestDataTransferProtocol
          org.apache.hadoop.hdfs.TestDFSClientRetries
          org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
          org.apache.hadoop.hdfs.TestMiniDFSCluster
          org.apache.hadoop.fs.viewfs.TestViewFsAtHdfsRoot
          org.apache.hadoop.hdfs.server.namenode.TestSaveNamespace
          org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
          org.apache.hadoop.hdfs.TestParallelShortCircuitReadUnCached
          org.apache.hadoop.hdfs.TestFetchImage
          org.apache.hadoop.hdfs.TestDFSRollback
          org.apache.hadoop.fs.viewfs.TestViewFileSystemHdfs
          org.apache.hadoop.fs.TestHDFSFileContextMainOperations
          org.apache.hadoop.hdfs.TestParallelShortCircuitLegacyRead
          org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing
          org.apache.hadoop.hdfs.server.namenode.TestSecurityTokenEditLog
          org.apache.hadoop.cli.TestHDFSCLI
          org.apache.hadoop.hdfs.server.blockmanagement.TestOverReplicatedBlocks
          org.apache.hadoop.hdfs.server.namenode.TestStartup
          org.apache.hadoop.hdfs.server.namenode.TestINodeFile
          org.apache.hadoop.fs.viewfs.TestViewFsHdfs
          org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
          org.apache.hadoop.hdfs.TestHDFSFileSystemContract
          org.apache.hadoop.hdfs.security.TestDelegationToken
          org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionFunctional
          org.apache.hadoop.hdfs.TestDFSPermission
          org.apache.hadoop.hdfs.TestParallelShortCircuitReadNoChecksum
          org.apache.hadoop.hdfs.security.TestDelegationTokenForProxyUser
          org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
          org.apache.hadoop.hdfs.TestParallelShortCircuitRead
          org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
          org.apache.hadoop.hdfs.web.TestWebHDFS
          org.apache.hadoop.hdfs.TestParallelRead
          org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
          org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader
          org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions
          org.apache.hadoop.hdfs.tools.TestDFSHAAdminMiniCluster
          org.apache.hadoop.hdfs.TestDFSUpgrade
          org.apache.hadoop.hdfs.web.TestFSMainOperationsWebHdfs
          org.apache.hadoop.hdfs.server.datanode.TestHSync

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4342//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4342//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581106/HDFS-4777.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestParallelImageWrite org.apache.hadoop.hdfs.TestSafeMode org.apache.hadoop.fs.viewfs.TestViewFileSystemAtHdfsRoot org.apache.hadoop.hdfs.TestParallelUnixDomainRead org.apache.hadoop.hdfs.TestLease org.apache.hadoop.hdfs.TestLeaseRecovery2 org.apache.hadoop.hdfs.TestLeaseRecovery org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract org.apache.hadoop.fs.TestResolveHdfsSymlink org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby org.apache.hadoop.hdfs.server.namenode.TestCheckPointForSecurityTokens org.apache.hadoop.hdfs.server.namenode.TestNameNodeResourceChecker org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestFileCreation org.apache.hadoop.hdfs.TestDFSFinalize org.apache.hadoop.hdfs.TestDFSStartupVersions org.apache.hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer org.apache.hadoop.hdfs.TestDataTransferProtocol org.apache.hadoop.hdfs.TestDFSClientRetries org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer org.apache.hadoop.hdfs.TestMiniDFSCluster org.apache.hadoop.fs.viewfs.TestViewFsAtHdfsRoot org.apache.hadoop.hdfs.server.namenode.TestSaveNamespace org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA org.apache.hadoop.hdfs.TestParallelShortCircuitReadUnCached org.apache.hadoop.hdfs.TestFetchImage org.apache.hadoop.hdfs.TestDFSRollback org.apache.hadoop.fs.viewfs.TestViewFileSystemHdfs org.apache.hadoop.fs.TestHDFSFileContextMainOperations org.apache.hadoop.hdfs.TestParallelShortCircuitLegacyRead org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing org.apache.hadoop.hdfs.server.namenode.TestSecurityTokenEditLog org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.server.blockmanagement.TestOverReplicatedBlocks org.apache.hadoop.hdfs.server.namenode.TestStartup org.apache.hadoop.hdfs.server.namenode.TestINodeFile org.apache.hadoop.fs.viewfs.TestViewFsHdfs org.apache.hadoop.hdfs.server.namenode.TestCheckpoint org.apache.hadoop.hdfs.TestHDFSFileSystemContract org.apache.hadoop.hdfs.security.TestDelegationToken org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionFunctional org.apache.hadoop.hdfs.TestDFSPermission org.apache.hadoop.hdfs.TestParallelShortCircuitReadNoChecksum org.apache.hadoop.hdfs.security.TestDelegationTokenForProxyUser org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestParallelShortCircuitRead org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics org.apache.hadoop.hdfs.web.TestWebHDFS org.apache.hadoop.hdfs.TestParallelRead org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions org.apache.hadoop.hdfs.tools.TestDFSHAAdminMiniCluster org.apache.hadoop.hdfs.TestDFSUpgrade org.apache.hadoop.hdfs.web.TestFSMainOperationsWebHdfs org.apache.hadoop.hdfs.server.datanode.TestHSync +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4342//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4342//console This message is automatically generated.
          Hide
          Suresh Srinivas added a comment -

          Changes in the patch:

          1. I am going to make a change I was planning to do for a long time. The attached patch has an assert which fails when editlog sync is called with either FSNamesystem read or write lock held. This should help in catching any accidental introduction of code which calls logSync in critical sections.
          2. This patch also modifies TestFileCreation to create a file with overwrite flag set when a file of that name already exists. This should trigger the logSync assert to fail.

          In a subsequent patch, I am going to post a fix with which the test failure should not occur.

          Show
          Suresh Srinivas added a comment - Changes in the patch: I am going to make a change I was planning to do for a long time. The attached patch has an assert which fails when editlog sync is called with either FSNamesystem read or write lock held. This should help in catching any accidental introduction of code which calls logSync in critical sections. This patch also modifies TestFileCreation to create a file with overwrite flag set when a file of that name already exists. This should trigger the logSync assert to fail. In a subsequent patch, I am going to post a fix with which the test failure should not occur.
          Hide
          Suresh Srinivas added a comment -

          Digging a little bit deeper, it turns out that my understanding of the lock is not correct. Here is how it works:

          1. FSNameystem#startFileInt - acquires writeLock - the hold count for the lock is 1
          2. FSNamesystem#deleteInternal - acquires writeLock again - since the thread holds the lock already the hold count is incremented to 2
          3. FSNamesystem#deleteInternal - releases writeLock. Hold count is decremented to 1
          4. FSNamesystem#deleteInternal - calls logSync(). The editlog logSync now is being done with the FSNamesystem lock held
          5. FSNamesystem#startFileInt release writeLock. Lock hold count is down to zero and it is released.

          Based on this I am going to update the jira header.

          Show
          Suresh Srinivas added a comment - Digging a little bit deeper, it turns out that my understanding of the lock is not correct. Here is how it works: FSNameystem#startFileInt - acquires writeLock - the hold count for the lock is 1 FSNamesystem#deleteInternal - acquires writeLock again - since the thread holds the lock already the hold count is incremented to 2 FSNamesystem#deleteInternal - releases writeLock. Hold count is decremented to 1 FSNamesystem#deleteInternal - calls logSync(). The editlog logSync now is being done with the FSNamesystem lock held FSNamesystem#startFileInt release writeLock. Lock hold count is down to zero and it is released. Based on this I am going to update the jira header.
          Hide
          Suresh Srinivas added a comment -

          FSNamesystem calls delete:

                  // File exists - must be one of append or overwrite
                  if (overwrite) {
                    delete(src, true);
                  } else {
          

          and deleteInternal() release the lock.

          Show
          Suresh Srinivas added a comment - FSNamesystem calls delete: // File exists - must be one of append or overwrite if (overwrite) { delete(src, true ); } else { and deleteInternal() release the lock.

            People

            • Assignee:
              Suresh Srinivas
              Reporter:
              Suresh Srinivas
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:

                Development