Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: ha, hdfs-client
    • Labels:
      None

      Issue Links

        Activity

        Hide
        sanjay.radia Sanjay Radia added a comment -

        NN may, in some situations, take a fairly long time to start, to load the image, apply edits and discover block locations. This may cause clients to timeout and declare the NN to be dead. Hence, when the Active is starting up, it should respond to client requests with “startingUp” response to indicate to the client should wait. This mode is a special case of the safemode.

        Show
        sanjay.radia Sanjay Radia added a comment - NN may, in some situations, take a fairly long time to start, to load the image, apply edits and discover block locations. This may cause clients to timeout and declare the NN to be dead. Hence, when the Active is starting up, it should respond to client requests with “startingUp” response to indicate to the client should wait. This mode is a special case of the safemode.
        Hide
        chansler Robert Chansler added a comment -

        But also see HDFS-1880 that suggests that clients should time out. I suppose it's a matter of just how impatient you are.

        Show
        chansler Robert Chansler added a comment - But also see HDFS-1880 that suggests that clients should time out. I suppose it's a matter of just how impatient you are.
        Hide
        atm Aaron T. Myers added a comment -

        I've taken a look into implementing this. The only difficulty in implementing this with the current NN architecture is that to create the NN's SecretManager for delegation tokens requires that the fsimage and edits files have already been loaded. The NN's RPC servers presently can't be created without an already-initialized SecretManager. The solution to this, then, is to make o.a.h.ipc.Server able to be created without an actual SecretManager, and to have it throw an exception at the IPC layer whenever a connection is made to indicate "I'm about to be up, but not fully initialized yet." Then, the NN can load the image/edits and shove a SecretManager into the existing o.a.h.ipc.Server instance(s).

        Thing is, this work doesn't really help much with the HA design of HDFS-1623 since the NN will already need to be changed to bring the NN RPC servers up (e.g. to receive block reports), and that work can't really be done until HDFS-1974 is implemented. So, I'm going to hold off on doing this work until HDFS-1974 is available.

        Show
        atm Aaron T. Myers added a comment - I've taken a look into implementing this. The only difficulty in implementing this with the current NN architecture is that to create the NN's SecretManager for delegation tokens requires that the fsimage and edits files have already been loaded. The NN's RPC servers presently can't be created without an already-initialized SecretManager . The solution to this, then, is to make o.a.h.ipc.Server able to be created without an actual SecretManager , and to have it throw an exception at the IPC layer whenever a connection is made to indicate "I'm about to be up, but not fully initialized yet." Then, the NN can load the image/edits and shove a SecretManager into the existing o.a.h.ipc.Server instance(s). Thing is, this work doesn't really help much with the HA design of HDFS-1623 since the NN will already need to be changed to bring the NN RPC servers up (e.g. to receive block reports), and that work can't really be done until HDFS-1974 is implemented. So, I'm going to hold off on doing this work until HDFS-1974 is available.
        Hide
        atm Aaron T. Myers added a comment -

        I think this issue is superseded by the changes made in HADOOP-7896 and HDFS-2680. If anyone disagrees, please feel free to reopen it.

        Show
        atm Aaron T. Myers added a comment - I think this issue is superseded by the changes made in HADOOP-7896 and HDFS-2680 . If anyone disagrees, please feel free to reopen it.

          People

          • Assignee:
            Unassigned
            Reporter:
            sanjay.radia Sanjay Radia
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development