Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1989

When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams.

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.23.0
    • Fix Version/s: None
    • Component/s: namenode
    • Labels:
      None

      Description

      Backup namenode initiates the checkpointing process.
      As a part of checkpointing based on the timestamp it tries to download the FSImage or use the existing one.

      Then it tries to save the FSImage.

      During this time it tries to close the editLog streams.

      Parallely when a client tries to close a file just after the checkpointing process closes the editLog Stream then we get an exception saying
      java.io.IOException: java.lang.IllegalStateException: !!! WARNING !!! File system changes are not persistent. No journal streams.

      Here the saveNameSpace api closes all the editlog streams resulting in this issue.

        Activity

        Hide
        ramkrishna.s.vasudevan added a comment -

        Why does saveNameSpace api closes all the editLog Streams.

        Ideally once checkpoint starts the streams have already been diverted to edit.new and the in saveCheckpoint we try to move the current directory contents to latestcheckpoint.tmp.

        So in what scneario do we require the editLog.close() which results in closing of all edit streams?

        Show
        ramkrishna.s.vasudevan added a comment - Why does saveNameSpace api closes all the editLog Streams. Ideally once checkpoint starts the streams have already been diverted to edit.new and the in saveCheckpoint we try to move the current directory contents to latestcheckpoint.tmp. So in what scneario do we require the editLog.close() which results in closing of all edit streams?
        Hide
        ramkrishna.s.vasudevan added a comment -

        of syncs: 21 SyncTimes(ms): 130 176
        2011-05-17 14:28:44,921 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to sync edit log.
        java.io.IOException: java.lang.IllegalStateException: !!! WARNING !!! File system changes are not persistent. No journal streams.
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logEdit(FSEditLog.java:1029)
        at org.apache.hadoop.hdfs.server.namenode.BackupImage.journal(BackupImage.java:247)
        at org.apache.hadoop.hdfs.server.namenode.BackupNode.journal(BackupNode.java:224)
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:422)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1419)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1413)

        at org.apache.hadoop.ipc.Client.call(Client.java:1052)
        at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:250)
        at $Proxy8.journal(Unknown Source)
        at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.send(EditLogBackupOutputStream.java:181)
        at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:155)
        at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:84)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:515)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1797)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:896)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:422)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1419)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1413)

        Show
        ramkrishna.s.vasudevan added a comment - of syncs: 21 SyncTimes(ms): 130 176 2011-05-17 14:28:44,921 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to sync edit log. java.io.IOException: java.lang.IllegalStateException: !!! WARNING !!! File system changes are not persistent. No journal streams. at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logEdit(FSEditLog.java:1029) at org.apache.hadoop.hdfs.server.namenode.BackupImage.journal(BackupImage.java:247) at org.apache.hadoop.hdfs.server.namenode.BackupNode.journal(BackupNode.java:224) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:422) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1419) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1413) at org.apache.hadoop.ipc.Client.call(Client.java:1052) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:250) at $Proxy8.journal(Unknown Source) at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.send(EditLogBackupOutputStream.java:181) at org.apache.hadoop.hdfs.server.namenode.EditLogBackupOutputStream.flushAndSync(EditLogBackupOutputStream.java:155) at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:84) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:515) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1797) at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:896) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:422) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1419) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1413)
        Hide
        Todd Lipcon added a comment -

        saveNamespace should only be happening while in safemode, and closing a file should not be allowed while in safemode. This may ahve been addressed by HDFS-988.

        On the BN side, it shouldn't be taking a checkpoint while the edits application is in active mode – it should be in "spool" mode where the edits aren't being applied.

        Show
        Todd Lipcon added a comment - saveNamespace should only be happening while in safemode, and closing a file should not be allowed while in safemode. This may ahve been addressed by HDFS-988 . On the BN side, it shouldn't be taking a checkpoint while the edits application is in active mode – it should be in "spool" mode where the edits aren't being applied.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Hi Todd,
        In the backup name node side during checkpointing

            bnImage.loadCheckpoint(sig);
            sig.validateStorageInfo(bnImage);
            bnImage.saveCheckpoint();
        
        void saveCheckpoint() throws IOException {
            saveNamespace(false);
          }
        

        In savenamespace

          void saveNamespace(boolean renewCheckpointTime) throws IOException {
         
            // try to restore all failed edit logs here
            assert editLog != null : "editLog must be initialized";
            storage.attemptRestoreRemovedStorage();
        
            editLog.close();
        

        So here the editlogs are getting closed in the Checkpoint flow.

        This is where the problem comes when the client tries to issue a close file after editLog.close() is exceuted.

        Show
        ramkrishna.s.vasudevan added a comment - Hi Todd, In the backup name node side during checkpointing bnImage.loadCheckpoint(sig); sig.validateStorageInfo(bnImage); bnImage.saveCheckpoint(); void saveCheckpoint() throws IOException { saveNamespace(false); } In savenamespace void saveNamespace(boolean renewCheckpointTime) throws IOException { // try to restore all failed edit logs here assert editLog != null : "editLog must be initialized"; storage.attemptRestoreRemovedStorage(); editLog.close(); So here the editlogs are getting closed in the Checkpoint flow. This is where the problem comes when the client tries to issue a close file after editLog.close() is exceuted.
        Hide
        Todd Lipcon added a comment -

        my point is that the client cannot issue a close() at this time, because the BNN has diverted its logs from "apply" mode to "spool" mode, and clients don't talk directly to the BN.

        Show
        Todd Lipcon added a comment - my point is that the client cannot issue a close() at this time, because the BNN has diverted its logs from "apply" mode to "spool" mode, and clients don't talk directly to the BN.
        Hide
        Konstantin Shvachko added a comment -

        Seems like a valid bug.
        Client does not directly talk to BN, but NN sends the journal transaction (close in the case). And if that kicks in when BN closed edits, but hasn't reopened them yet, the exception can happen.
        Todd's right the transaction should go into journal spool, but I suspect that edits.close() closes all streams including the spool, and that could be the problem.

        Show
        Konstantin Shvachko added a comment - Seems like a valid bug. Client does not directly talk to BN, but NN sends the journal transaction (close in the case). And if that kicks in when BN closed edits, but hasn't reopened them yet, the exception can happen. Todd's right the transaction should go into journal spool, but I suspect that edits.close() closes all streams including the spool, and that could be the problem.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Hi,

        Yes. the editsLog.close() is the problem as it closes all the editStreams including the diverted editStreams.

        Show
        ramkrishna.s.vasudevan added a comment - Hi, Yes. the editsLog.close() is the problem as it closes all the editStreams including the diverted editStreams.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Here in what scenario is it necessary to close the editLogs while saving the name space.
        W.r.t backup namenode already the edits.new streams has been opened and it is not present in the current directory.

        Is there any scenario where the editLogs.close() has to be called?
        Can we add an additional api which does not have this editLogs.close() and call it in checkpointflow alone?
        Pls correct me if am wrong.

        Show
        ramkrishna.s.vasudevan added a comment - Here in what scenario is it necessary to close the editLogs while saving the name space. W.r.t backup namenode already the edits.new streams has been opened and it is not present in the current directory. Is there any scenario where the editLogs.close() has to be called? Can we add an additional api which does not have this editLogs.close() and call it in checkpointflow alone? Pls correct me if am wrong.

          People

          • Assignee:
            Unassigned
            Reporter:
            ramkrishna.s.vasudevan
          • Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:

              Development