Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1623 High Availability Framework for HDFS NN
  3. HDFS-2909

HA: Inaccessible shared edits dir not getting removed from FSImage storage dirs upon error

    Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: HA branch (HDFS-1623)
    • Fix Version/s: None
    • Component/s: ha, namenode
    • Labels:
      None

      Issue Links

        Activity

        Hide
        Aaron T. Myers added a comment -

        Hey Bikas, could you provide some more context for this in the description of the JIRA?

        Show
        Aaron T. Myers added a comment - Hey Bikas, could you provide some more context for this in the description of the JIRA?
        Hide
        Bikas Saha added a comment -

        Repro steps
        1) Start 2 NN's in active standby mode
        2) Remove write permissions from shared edits dir
        3) Upon log roll triggered by standby, the active gets error when finalizing the edit logs
        4) The error exception is caught way up on the stack and error does not get reported against the bad shared edits dir

        This happens because error reporting happens when FSImage.rollEditLogs() calls storage.writeTransactionIdFileToStorage() which is called after FSEDit.rollEditLogs(). The error in FSEdit.rollEditLogs() raises an exception that is not handled in FSImage.rollEditLogs() and hence storage.writeTransactionIdFileToStorage() does not get called and no error is reported. The bad directory continues to remain in FSImage.storage.

        Show
        Bikas Saha added a comment - Repro steps 1) Start 2 NN's in active standby mode 2) Remove write permissions from shared edits dir 3) Upon log roll triggered by standby, the active gets error when finalizing the edit logs 4) The error exception is caught way up on the stack and error does not get reported against the bad shared edits dir This happens because error reporting happens when FSImage.rollEditLogs() calls storage.writeTransactionIdFileToStorage() which is called after FSEDit.rollEditLogs(). The error in FSEdit.rollEditLogs() raises an exception that is not handled in FSImage.rollEditLogs() and hence storage.writeTransactionIdFileToStorage() does not get called and no error is reported. The bad directory continues to remain in FSImage.storage.
        Hide
        Todd Lipcon added a comment -

        It seems that, when rollEditLogs fails in the shared dir, the NN should abort. Any idea why it isn't aborting?

        Show
        Todd Lipcon added a comment - It seems that, when rollEditLogs fails in the shared dir, the NN should abort. Any idea why it isn't aborting?
        Hide
        Bikas Saha added a comment -

        This is happening because JournalSet.mapJournalsAndReportErrors() calls abortAllJournals() and throws new IOException when a required journal fails (in this case, the shared dir). I still have to see why the NN continues to run as active after this.
        Coming back to the above, it seems that the abortAllJournals() code implies that NN should stop running when something like this happens. That would mean that inaccessibility of the the single shared edits dir will cause the active NN to shutdown. Most likely the standby NN will also not be able to access the shared edits dir. Which means that the shared edits dir has become a single point of failure for the HA service.
        Still looking at why NN did not abort.

        Show
        Bikas Saha added a comment - This is happening because JournalSet.mapJournalsAndReportErrors() calls abortAllJournals() and throws new IOException when a required journal fails (in this case, the shared dir). I still have to see why the NN continues to run as active after this. Coming back to the above, it seems that the abortAllJournals() code implies that NN should stop running when something like this happens. That would mean that inaccessibility of the the single shared edits dir will cause the active NN to shutdown. Most likely the standby NN will also not be able to access the shared edits dir. Which means that the shared edits dir has become a single point of failure for the HA service. Still looking at why NN did not abort.
        Hide
        Bikas Saha added a comment -

        The NN did not abort because it simply threw an IOException after calling abortAllJournals(). The RPC server translated the IOException into a ServiceException and responded to the client. So the NN continued to run and also considers itself active.

        Show
        Bikas Saha added a comment - The NN did not abort because it simply threw an IOException after calling abortAllJournals(). The RPC server translated the IOException into a ServiceException and responded to the client. So the NN continued to run and also considers itself active.
        Hide
        Bikas Saha added a comment -

        Aside from all the above I see some other issues.
        Say everything is healthy and FSImage.rollEditLogs() is called.
        It first calls FSEditLogs.rollLogs that actually rolls the logs.
        It then calls storage.writeTransactionIdFileToStorage() which records this in all storage dirs so that the information about the rolled edits is not lost.
        However, NN could crash in after FSEditLogs.rollLogs() has completed and before storage.writeTransactionIdFileToStorage() is called. That might leave the data in an inconsistent state.

        Show
        Bikas Saha added a comment - Aside from all the above I see some other issues. Say everything is healthy and FSImage.rollEditLogs() is called. It first calls FSEditLogs.rollLogs that actually rolls the logs. It then calls storage.writeTransactionIdFileToStorage() which records this in all storage dirs so that the information about the rolled edits is not lost. However, NN could crash in after FSEditLogs.rollLogs() has completed and before storage.writeTransactionIdFileToStorage() is called. That might leave the data in an inconsistent state.
        Hide
        Todd Lipcon added a comment -

        Say everything is healthy and FSImage.rollEditLogs() is called.
        It first calls FSEditLogs.rollLogs that actually rolls the logs.
        It then calls storage.writeTransactionIdFileToStorage() which records this in all storage dirs so that the information about the rolled edits is not lost.
        However, NN could crash in after FSEditLogs.rollLogs() has completed and before storage.writeTransactionIdFileToStorage() is called. That might leave the data in an inconsistent state.

        I don't think this inconsistent state is problematic. The requirement is that we don't log any actual edits to the new edit log until it's been recorded in all of the storage directories. In the case of the crash you described, you might be able to start up without the new edit log segment, but that edit log segment would be empty anyway.

        Show
        Todd Lipcon added a comment - Say everything is healthy and FSImage.rollEditLogs() is called. It first calls FSEditLogs.rollLogs that actually rolls the logs. It then calls storage.writeTransactionIdFileToStorage() which records this in all storage dirs so that the information about the rolled edits is not lost. However, NN could crash in after FSEditLogs.rollLogs() has completed and before storage.writeTransactionIdFileToStorage() is called. That might leave the data in an inconsistent state. I don't think this inconsistent state is problematic. The requirement is that we don't log any actual edits to the new edit log until it's been recorded in all of the storage directories. In the case of the crash you described, you might be able to start up without the new edit log segment, but that edit log segment would be empty anyway.
        Hide
        Bikas Saha added a comment -

        Yes. That is what I found when I ran the experiment myself. The NN can restart OK after this event.

        The requirement is that we don't log any actual edits to the new edit log until it's been recorded in all of the storage directories.

        Does this mean that the NN will stop further edits until that storage dir is restored? Or it will create a new edit log, record it in the remaining healthy dirs and go on from there?

        Show
        Bikas Saha added a comment - Yes. That is what I found when I ran the experiment myself. The NN can restart OK after this event. The requirement is that we don't log any actual edits to the new edit log until it's been recorded in all of the storage directories. Does this mean that the NN will stop further edits until that storage dir is restored? Or it will create a new edit log, record it in the remaining healthy dirs and go on from there?
        Hide
        Bikas Saha added a comment -

        storage.writeTransactionIdFileToStorage() may or may not be called depending on whether IOException's are swallowed at some point in FSEditLog.rollEditLogs(). There does not seem to be a logic/requirement of when storage.writeTransactionIdFileToStorage() should or should not be called. Marking bad storage directories is a side effect of calling storage.writeTransactionIdFileToStorage().
        Also, JournalSet works independent of storage directory state.

        Show
        Bikas Saha added a comment - storage.writeTransactionIdFileToStorage() may or may not be called depending on whether IOException's are swallowed at some point in FSEditLog.rollEditLogs(). There does not seem to be a logic/requirement of when storage.writeTransactionIdFileToStorage() should or should not be called. Marking bad storage directories is a side effect of calling storage.writeTransactionIdFileToStorage(). Also, JournalSet works independent of storage directory state.
        Hide
        Bikas Saha added a comment -

        storage.writeTransactionIdFileToStorage() may or may not be called depending on whether IOException's are swallowed at some point in FSEditLog.rollEditLogs(). There does not seem to be a logic/requirement of when storage.writeTransactionIdFileToStorage() should or should not be called. Marking bad storage directories is a side effect of calling storage.writeTransactionIdFileToStorage().
        Also, JournalSet works independent of storage directory state.

        Show
        Bikas Saha added a comment - storage.writeTransactionIdFileToStorage() may or may not be called depending on whether IOException's are swallowed at some point in FSEditLog.rollEditLogs(). There does not seem to be a logic/requirement of when storage.writeTransactionIdFileToStorage() should or should not be called. Marking bad storage directories is a side effect of calling storage.writeTransactionIdFileToStorage(). Also, JournalSet works independent of storage directory state.
        Hide
        Bikas Saha added a comment -

        Attaching a patch which syncs FileJournalManager with NNStorage. When FileJournalManager has an error doing I/O then it reports an error against its StorageDirectory to NNStorage. Thus errors in file journal get reported to the NNStorage dirs.

        This only does the half the sync. IMO, FileJournalManager should check the state of its StorageDirectory inside NNStorage before proceeding to make changes. That would complete the sync between FileJournalManager and NNStorage. That way when a StorageDirectory get removed/restored then the FileJournalManager can stop/restore write to it.

        Show
        Bikas Saha added a comment - Attaching a patch which syncs FileJournalManager with NNStorage. When FileJournalManager has an error doing I/O then it reports an error against its StorageDirectory to NNStorage. Thus errors in file journal get reported to the NNStorage dirs. This only does the half the sync. IMO, FileJournalManager should check the state of its StorageDirectory inside NNStorage before proceeding to make changes. That would complete the sync between FileJournalManager and NNStorage. That way when a StorageDirectory get removed/restored then the FileJournalManager can stop/restore write to it.
        Hide
        Jitendra Nath Pandey added a comment -

        +1. lgtm

        Show
        Jitendra Nath Pandey added a comment - +1. lgtm
        Hide
        Bikas Saha added a comment -

        ran all tests under hadoop-hdfs and they pass.

        Show
        Bikas Saha added a comment - ran all tests under hadoop-hdfs and they pass.
        Hide
        Jitendra Nath Pandey added a comment -

        Committed to the branch. Thanks to Bikas!

        Show
        Jitendra Nath Pandey added a comment - Committed to the branch. Thanks to Bikas!
        Hide
        Bikas Saha added a comment -

        Thanks!

        Show
        Bikas Saha added a comment - Thanks!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-HAbranch-build #79 (See https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/79/)
        HDFS-2909. HA: Inaccessible shared edits dir not getting removed from FSImage storage dirs upon error. Contributed by Bikas Saha. (Revision 1244753)

        Result = FAILURE
        jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1244753
        Files :

        • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/CHANGES.HDFS-1623.txt
        • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
        • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java
        • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java
        • /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNNStorageRetentionManager.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-HAbranch-build #79 (See https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/79/ ) HDFS-2909 . HA: Inaccessible shared edits dir not getting removed from FSImage storage dirs upon error. Contributed by Bikas Saha. (Revision 1244753) Result = FAILURE jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1244753 Files : /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/CHANGES. HDFS-1623 .txt /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java /hadoop/common/branches/ HDFS-1623 /hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNNStorageRetentionManager.java

          People

          • Assignee:
            Bikas Saha
            Reporter:
            Bikas Saha
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development