Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: QuorumJournalManager (HDFS-3077)
    • Component/s: ha
    • Labels:
      None

      Description

      Currently, if a logger misses an RPC in the middle of a log segment, or misses the startLogSegment RPC (eg it was down or network was disconnected during that time period), then it will throw an exception on every subsequent journal() call in that segment, since it knows that it missed some edits in the middle.

      We should change this exception to a specific IOE subclass, and have the client side of QJM detect the situation and stop sending IPCs until the next startLogSegment call.

      This isn't critical for correctness but will help reduce log spew on both sides.

      1. amend.txt
        2 kB
        Todd Lipcon
      2. hdfs-3726.txt
        13 kB
        Todd Lipcon
      3. hdfs-3726.txt
        15 kB
        Todd Lipcon

        Activity

        Hide
        Todd Lipcon added a comment -

        Attached patch introduces the improvement as described above.

        There is a new unit test, and I also tested manually as follows:

        • Start cluster configured to write to QJM
        • Start 10 threads performing HDFS transactions
        • Restart one JN

        This used to cause incessant log spew on the console of the restarted JN. With the patch, it resulted in the following on the server side:

        12/09/03 20:55:55 INFO ipc.Server: IPC Server handler 0 on 13001, call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.journal from 127.0.0.1:47669: error: org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write, no segment open
        org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write, no segment open
                at org.apache.hadoop.hdfs.qjournal.server.Journal.checkSync(Journal.java:384)
                at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:278)
                at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:121)
                at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:111)
                at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:12442)
        

        and on the NN:

        12/09/03 20:55:55 WARN client.QuorumJournalManager: Remote journal Channel to journal node localhost/127.0.0.1:13001 is not in sync. Will retry on next roll.
        

        The web UI noted: "Written txid 33668659 (120608 behind) (will re-join on next segment)"

        Upon the next roll, it logged:

        12/09/03 20:59:09 INFO namenode.FSEditLog: Rolling edit logs.
        12/09/03 20:59:09 INFO namenode.FSEditLog: Ending log segment 33668125
        12/09/03 20:59:09 INFO namenode.FSEditLog: Number of transactions: 133332 Total time for transactions(ms): 2171Number of transactions batched in Syncs: 102034 Number of syncs: 31297 SyncTimes(ms): 31840 9550 7644 
        12/09/03 20:59:09 INFO namenode.FSEditLog: Number of transactions: 133332 Total time for transactions(ms): 2171Number of transactions batched in Syncs: 102034 Number of syncs: 31298 SyncTimes(ms): 31844 9550 7644 
        12/09/03 20:59:11 INFO namenode.FileJournalManager: Finalizing edits file /tmp/name1-name/current/edits_inprogress_0000000000033668125 -> /tmp/name1-name/current/edits_0000000000033668125-0000000000033801456
        12/09/03 20:59:11 INFO namenode.FileJournalManager: Finalizing edits file /tmp/name1-name2/current/edits_inprogress_0000000000033668125 -> /tmp/name1-name2/current/edits_0000000000033668125-0000000000033801456
        12/09/03 20:59:11 INFO namenode.FSEditLog: Starting log segment at 33801457
        12/09/03 20:59:11 INFO client.QuorumJournalManager: Retrying Channel to journal node localhost/127.0.0.1:13001 in new segment starting at txid 33801457
        

        and the restarted JN was up-to-date again.

        Show
        Todd Lipcon added a comment - Attached patch introduces the improvement as described above. There is a new unit test, and I also tested manually as follows: Start cluster configured to write to QJM Start 10 threads performing HDFS transactions Restart one JN This used to cause incessant log spew on the console of the restarted JN. With the patch, it resulted in the following on the server side: 12/09/03 20:55:55 INFO ipc.Server: IPC Server handler 0 on 13001, call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.journal from 127.0.0.1:47669: error: org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write, no segment open org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write, no segment open at org.apache.hadoop.hdfs.qjournal.server.Journal.checkSync(Journal.java:384) at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:278) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:121) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:111) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:12442) and on the NN: 12/09/03 20:55:55 WARN client.QuorumJournalManager: Remote journal Channel to journal node localhost/127.0.0.1:13001 is not in sync. Will retry on next roll. The web UI noted: "Written txid 33668659 (120608 behind) (will re-join on next segment)" Upon the next roll, it logged: 12/09/03 20:59:09 INFO namenode.FSEditLog: Rolling edit logs. 12/09/03 20:59:09 INFO namenode.FSEditLog: Ending log segment 33668125 12/09/03 20:59:09 INFO namenode.FSEditLog: Number of transactions: 133332 Total time for transactions(ms): 2171Number of transactions batched in Syncs: 102034 Number of syncs: 31297 SyncTimes(ms): 31840 9550 7644 12/09/03 20:59:09 INFO namenode.FSEditLog: Number of transactions: 133332 Total time for transactions(ms): 2171Number of transactions batched in Syncs: 102034 Number of syncs: 31298 SyncTimes(ms): 31844 9550 7644 12/09/03 20:59:11 INFO namenode.FileJournalManager: Finalizing edits file /tmp/name1-name/current/edits_inprogress_0000000000033668125 -> /tmp/name1-name/current/edits_0000000000033668125-0000000000033801456 12/09/03 20:59:11 INFO namenode.FileJournalManager: Finalizing edits file /tmp/name1-name2/current/edits_inprogress_0000000000033668125 -> /tmp/name1-name2/current/edits_0000000000033668125-0000000000033801456 12/09/03 20:59:11 INFO namenode.FSEditLog: Starting log segment at 33801457 12/09/03 20:59:11 INFO client.QuorumJournalManager: Retrying Channel to journal node localhost/127.0.0.1:13001 in new segment starting at txid 33801457 and the restarted JN was up-to-date again.
        Hide
        Todd Lipcon added a comment -

        Been thinking about this a bit, and I think it might actually make sense to set the "outOfSync" flag on any exception. For example, if the JN has gone down, the client will receive generic IOExceptions. We know that, as soon as we miss any edit in the segment due to the IOE, there's no sense sending more since they will just result in JournalOutOfSync.

        If folks agree, I will make that change.

        I also want to make another small change at the same time: any time one of the JNs throws an exception, we should log it, even if a majority succeeded. That will help cluster administrators diagnose the case when one of the JNs has gone down or having disk issues. Currently, these types of errors are silent on the client side, which is not so good.

        Show
        Todd Lipcon added a comment - Been thinking about this a bit, and I think it might actually make sense to set the "outOfSync" flag on any exception. For example, if the JN has gone down, the client will receive generic IOExceptions. We know that, as soon as we miss any edit in the segment due to the IOE, there's no sense sending more since they will just result in JournalOutOfSync. If folks agree, I will make that change. I also want to make another small change at the same time: any time one of the JNs throws an exception, we should log it, even if a majority succeeded. That will help cluster administrators diagnose the case when one of the JNs has gone down or having disk issues. Currently, these types of errors are silent on the client side, which is not so good.
        Hide
        Eli Collins added a comment -

        Setting outOfSync on any exception makes sense to me.

        Show
        Eli Collins added a comment - Setting outOfSync on any exception makes sense to me.
        Hide
        Todd Lipcon added a comment -

        Here's an updated patch. I ran all the QJM tests and they passed.

        Show
        Todd Lipcon added a comment - Here's an updated patch. I ran all the QJM tests and they passed.
        Hide
        Eli Collins added a comment -

        +1 looks great

        Nit: "excpetion"

        Show
        Eli Collins added a comment - +1 looks great Nit: "excpetion"
        Hide
        Todd Lipcon added a comment -

        Changed the comment to read:

          /**
           * If this logger misses some edits, or restarts in the middle of
           * a segment, the writer won't be able to write any more edits until
           * the beginning of the next segment. Upon detecting this situation,
           * the writer sets this flag to true to avoid sending useless RPCs.
           */
        

        (which is more accurate given the change made above)

        Will commit momentarily

        Show
        Todd Lipcon added a comment - Changed the comment to read: /** * If this logger misses some edits, or restarts in the middle of * a segment, the writer won't be able to write any more edits until * the beginning of the next segment. Upon detecting this situation, * the writer sets this flag to true to avoid sending useless RPCs. */ (which is more accurate given the change made above) Will commit momentarily
        Hide
        Todd Lipcon added a comment -

        Committed to branch, thanks for the review.

        Show
        Todd Lipcon added a comment - Committed to branch, thanks for the review.
        Hide
        Eli Collins added a comment -

        Oops, the patch needs to include JournalOutOfSyncException, sorry I missed that!

        Show
        Eli Collins added a comment - Oops, the patch needs to include JournalOutOfSyncException, sorry I missed that!
        Hide
        Todd Lipcon added a comment -

        Woops, here's the patch to add JournalOutOfSyncException.java. Sorry I forgot to git add!

        Show
        Todd Lipcon added a comment - Woops, here's the patch to add JournalOutOfSyncException.java. Sorry I forgot to git add!
        Hide
        Eli Collins added a comment -

        +1 thanks!

        Show
        Eli Collins added a comment - +1 thanks!
        Hide
        Todd Lipcon added a comment -

        Committed the new file to branch as well. Thanks

        Show
        Todd Lipcon added a comment - Committed the new file to branch as well. Thanks

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development