Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Component/s: ha
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This protocol is used to synchronize lagging journal service with active journal service with complete finalized edit segments. Currently it supports one rpc call getEditLogManifest() which lists finalized edit segments.

      1. HDFS-3313.HDFS-3092.patch
        31 kB
        Brandon Li
      2. HDFS-3313.HDFS-3092.patch
        29 kB
        Brandon Li

        Activity

        Hide
        Tsz Wo Nicholas Sze added a comment -

        Looks good. Some comments:

        • We should not change NameNodeProxies. NN is not involved in journal synchronization.
        • For the same reason, the principals in JournalSyncProtocolPB should not be Namenode.
          +    serverPrincipal = DFSConfigKeys.DFS_NAMENODE_USER_NAME_KEY,
          +    clientPrincipal = DFSConfigKeys.DFS_NAMENODE_USER_NAME_KEY)
          
        • Use FSEditLog.getEditLogManifest(long fromTxId) instead of FileJournalManager and don't change FileJournalManager.
        • Why DN is related to JournalSyncProtocol? Please check the comments.
          +   * If you are adding/changing DN's interface then you need to change both this
          
        • I think we should pass JournalInfo in getEditLogManifest(..) so that it could first verify the version, namespace id, etc.
        • Why the test is commented out?
          -  @Test
          +  //@Test
             public void testHttpServer() throws Exception {
          
        Show
        Tsz Wo Nicholas Sze added a comment - Looks good. Some comments: We should not change NameNodeProxies. NN is not involved in journal synchronization. For the same reason, the principals in JournalSyncProtocolPB should not be Namenode. + serverPrincipal = DFSConfigKeys.DFS_NAMENODE_USER_NAME_KEY, + clientPrincipal = DFSConfigKeys.DFS_NAMENODE_USER_NAME_KEY) Use FSEditLog.getEditLogManifest(long fromTxId) instead of FileJournalManager and don't change FileJournalManager. Why DN is related to JournalSyncProtocol? Please check the comments. + * If you are adding/changing DN's interface then you need to change both this I think we should pass JournalInfo in getEditLogManifest(..) so that it could first verify the version, namespace id, etc. Why the test is commented out? - @Test + //@Test public void testHttpServer() throws Exception {
        Hide
        Aaron T. Myers added a comment -

        Seems like this patch duplicates a lot of code from NamenodeProtocol#getEditLogManifest and related translators. Can this perhaps be factored out into a generic RemoteEditLogProtocol that both the NN and journal service implement?

        Show
        Aaron T. Myers added a comment - Seems like this patch duplicates a lot of code from NamenodeProtocol#getEditLogManifest and related translators. Can this perhaps be factored out into a generic RemoteEditLogProtocol that both the NN and journal service implement?
        Hide
        Brandon Li added a comment -

        Seems like this patch duplicates a lot of code from NamenodeProtocol#getEditLogManifest and related translators. Can this perhaps be factored out into a generic RemoteEditLogProtocol that both the NN and journal service implement?

        Thanks for the comment.
        As pointed out by Nicholas, the interface of getEditLogManifes() in JournalSyncProtocol should be different(at least adding JournalInfo into the request) with the one in namenode protocol.

        Show
        Brandon Li added a comment - Seems like this patch duplicates a lot of code from NamenodeProtocol#getEditLogManifest and related translators. Can this perhaps be factored out into a generic RemoteEditLogProtocol that both the NN and journal service implement? Thanks for the comment. As pointed out by Nicholas, the interface of getEditLogManifes() in JournalSyncProtocol should be different(at least adding JournalInfo into the request) with the one in namenode protocol.
        Hide
        Brandon Li added a comment -

        We should not change NameNodeProxies. NN is not involved in journal synchronization.

        fixed by creating a static method in JournalService to get the roc proxy.

        For the same reason, the principals in JournalSyncProtocolPB should not be Namenode.

        Replaced with Journal User Name

        Use FSEditLog.getEditLogManifest(long fromTxId) instead of FileJournalManager and don't change FileJournalManager.

        done.

        Why DN is related to JournalSyncProtocol? Please check the comments.

        fixed.

        I think we should pass JournalInfo in getEditLogManifest(..) so that it could first verify the version, namespace id, etc.

        done.

        Why the test is commented out?

        fixed.

        Show
        Brandon Li added a comment - We should not change NameNodeProxies. NN is not involved in journal synchronization. fixed by creating a static method in JournalService to get the roc proxy. For the same reason, the principals in JournalSyncProtocolPB should not be Namenode. Replaced with Journal User Name Use FSEditLog.getEditLogManifest(long fromTxId) instead of FileJournalManager and don't change FileJournalManager. done. Why DN is related to JournalSyncProtocol? Please check the comments. fixed. I think we should pass JournalInfo in getEditLogManifest(..) so that it could first verify the version, namespace id, etc. done. Why the test is commented out? fixed.
        Hide
        Aaron T. Myers added a comment -

        As pointed out by Nicholas, the interface of getEditLogManifes() in JournalSyncProtocol should be different(at least adding JournalInfo into the request) with the one in namenode protocol.

        Why couldn't you send the JournalInfo in all cases, both when calling getEditLogManifest on the NN for checkpointing, and in the journal service?

        Even with this slight difference, there's still a lot of duplicated code between the two, and it seems a shame to implement two parallel, nearly identical methods in distinct protocols.

        Show
        Aaron T. Myers added a comment - As pointed out by Nicholas, the interface of getEditLogManifes() in JournalSyncProtocol should be different(at least adding JournalInfo into the request) with the one in namenode protocol. Why couldn't you send the JournalInfo in all cases, both when calling getEditLogManifest on the NN for checkpointing, and in the journal service? Even with this slight difference, there's still a lot of duplicated code between the two, and it seems a shame to implement two parallel, nearly identical methods in distinct protocols.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Hi Aaron, your suggestion makes sense but it is not easy to do the refactoring. I actually want to refactor all editlog related stuff. Let's get the journal service implementation done first and then do the refactoring later.

        Show
        Tsz Wo Nicholas Sze added a comment - Hi Aaron, your suggestion makes sense but it is not easy to do the refactoring. I actually want to refactor all editlog related stuff. Let's get the journal service implementation done first and then do the refactoring later.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        +1 patch looks good.

        I have committed this. Thanks, Brandon!

        Show
        Tsz Wo Nicholas Sze added a comment - +1 patch looks good. I have committed this. Thanks, Brandon!
        Hide
        Brandon Li added a comment -


        One refactoring we can do is to remove getEditLogManifes() from namenode protocol and let namenode use JournalSyncProtocol. This is similar as the one Aaron suggested, with the difference that we won't have a protocol supportting only getEditLogManifes() just to reduce dup code.

        Show
        Brandon Li added a comment - One refactoring we can do is to remove getEditLogManifes() from namenode protocol and let namenode use JournalSyncProtocol. This is similar as the one Aaron suggested, with the difference that we won't have a protocol supportting only getEditLogManifes() just to reduce dup code.
        Hide
        Aaron T. Myers added a comment -

        One refactoring we can do is to remove getEditLogManifes() from namenode protocol and let namenode use JournalSyncProtocol. This is similar as the one Aaron suggested, with the difference that we won't have a protocol supportting only getEditLogManifes() just to reduce dup code.

        Sure, that'll work until you want to add more methods to the JournalSyncProtocol that don't make sense for the NN to implement.

        Show
        Aaron T. Myers added a comment - One refactoring we can do is to remove getEditLogManifes() from namenode protocol and let namenode use JournalSyncProtocol. This is similar as the one Aaron suggested, with the difference that we won't have a protocol supportting only getEditLogManifes() just to reduce dup code. Sure, that'll work until you want to add more methods to the JournalSyncProtocol that don't make sense for the NN to implement.

          People

          • Assignee:
            Brandon Li
            Reporter:
            Brandon Li
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development