Hadoop Common
  1. Hadoop Common
  2. HADOOP-756

new dfsadmin command to wait until safe mode is exited

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.10.0
    • Component/s: None
    • Labels:
      None

      Description

      I would like to have a dfsadmin command that waits until dfs leaves safemode. I want to be able to have my start up scripts wait until dfs is up before starting my jobtracker.

      1. safemodewait2.patch
        2 kB
        dhruba borthakur

        Activity

        Hide
        dhruba borthakur added a comment -

        A simple polling loop (outside of HDFS) can achieve this goal too.

        Show
        dhruba borthakur added a comment - A simple polling loop (outside of HDFS) can achieve this goal too.
        Hide
        dhruba borthakur added a comment -

        Hi Owen,

        Can you pl give me some background regarding this issue? Can we achieve your required objective by implementing a simple polling loop outside HDFS?

        thanks,
        dhruba

        Show
        dhruba borthakur added a comment - Hi Owen, Can you pl give me some background regarding this issue? Can we achieve your required objective by implementing a simple polling loop outside HDFS? thanks, dhruba
        Hide
        Owen O'Malley added a comment -

        Of course you could write a loop in each script that needs to do this, but it seems like a pretty common wish. You don't really want to start up map/reduce clusters until the underlying hdfs is ready. It should be a couple of lines to add the functionality.

        Show
        Owen O'Malley added a comment - Of course you could write a loop in each script that needs to do this, but it seems like a pretty common wish. You don't really want to start up map/reduce clusters until the underlying hdfs is ready. It should be a couple of lines to add the functionality.
        Hide
        dhruba borthakur added a comment -

        Ok, then i will create a new dfsadmin. Currently we have a command of the following type:

        bin/hadoop dfsadmin -safemode [get][set][clear]

        I am going to add another option:

        bin/hadoop dfsadmin -safemode [get][set][clear][wait]

        The [wait] option will cause the command to block till HDFS exists safemode. Internally, the DFSAdmin class will use a while-sleep-poll loop to wait.

        Show
        dhruba borthakur added a comment - Ok, then i will create a new dfsadmin. Currently we have a command of the following type: bin/hadoop dfsadmin -safemode [get] [set] [clear] I am going to add another option: bin/hadoop dfsadmin -safemode [get] [set] [clear] [wait] The [wait] option will cause the command to block till HDFS exists safemode. Internally, the DFSAdmin class will use a while-sleep-poll loop to wait.
        Hide
        dhruba borthakur added a comment -

        Code review requested.

        Show
        dhruba borthakur added a comment - Code review requested.
        Hide
        Owen O'Malley added a comment -

        It looks good except that the code is a little strange:

        boolean done = false;
        while(!done)

        { done = ...; }

        is better than:
        while (true) {
        boolean done = ...;
        if (done)

        { break; }

        ...
        }

        Show
        Owen O'Malley added a comment - It looks good except that the code is a little strange: boolean done = false; while(!done) { done = ...; } is better than: while (true) { boolean done = ...; if (done) { break; } ... }
        Hide
        dhruba borthakur added a comment -

        Earlier file was corrupted as part of upload. Please review this one.

        Show
        dhruba borthakur added a comment - Earlier file was corrupted as part of upload. Please review this one.
        Hide
        Konstantin Shvachko added a comment -

        while(!done) {
        calculates conditional expression once, while
        while (true) { ... if (done)
        does it twice.

        Show
        Konstantin Shvachko added a comment - while(!done) { calculates conditional expression once, while while (true) { ... if (done) does it twice.
        Hide
        Owen O'Malley added a comment -

        True, but unless this is a tight inner loop the performance isn't worth the increase in mental complexity of having a non-standard loop body.

        Show
        Owen O'Malley added a comment - True, but unless this is a tight inner loop the performance isn't worth the increase in mental complexity of having a non-standard loop body.
        Hide
        dhruba borthakur added a comment -

        Incorporated code review comments.

        Show
        dhruba borthakur added a comment - Incorporated code review comments.
        Hide
        Raghu Angadi added a comment -

        5 second sleep is very long. Also no do we need to throw an exception when interrupted?

        I would write
        while ( !( done = isDone() ) )

        { sleep(100); }

        Show
        Raghu Angadi added a comment - 5 second sleep is very long. Also no do we need to throw an exception when interrupted? I would write while ( !( done = isDone() ) ) { sleep(100); }
        Hide
        Hadoop QA added a comment -

        +1, http://issues.apache.org/jira/secure/attachment/12346438/safemodewait1.patch applied and successfully tested against trunk revision r481432

        Show
        Hadoop QA added a comment - +1, http://issues.apache.org/jira/secure/attachment/12346438/safemodewait1.patch applied and successfully tested against trunk revision r481432
        Hide
        dhruba borthakur added a comment -

        I think the 5 second sleep is appropriate. A smaller timeperiod will need the namenode to do more wasted work (because of an RPC call).

        Regarding the "exception during interrupt", the higher level functions catches the interrupt and prints out an error message. This lets the user know that the call terminated because of an interrupt (and not because the namenode exited safemode).

        Show
        dhruba borthakur added a comment - I think the 5 second sleep is appropriate. A smaller timeperiod will need the namenode to do more wasted work (because of an RPC call). Regarding the "exception during interrupt", the higher level functions catches the interrupt and prints out an error message. This lets the user know that the call terminated because of an interrupt (and not because the namenode exited safemode).
        Hide
        Owen O'Malley added a comment -

        I think the 5 seconds is great.

        Show
        Owen O'Malley added a comment - I think the 5 seconds is great.
        Hide
        Raghu Angadi added a comment -

        Correct, I was not thinking of an RPC call.

        Show
        Raghu Angadi added a comment - Correct, I was not thinking of an RPC call.
        Hide
        Konstantin Shvachko added a comment -

        I think it is better if DFSAdmin prints something when the wait is over.

        Show
        Konstantin Shvachko added a comment - I think it is better if DFSAdmin prints something when the wait is over.
        Hide
        dhruba borthakur added a comment -

        Incorporated review comments from Konstantin.

        Show
        dhruba borthakur added a comment - Incorporated review comments from Konstantin.
        Hide
        Doug Cutting added a comment -

        I just committed this. Thanks, Dhruba.

        Show
        Doug Cutting added a comment - I just committed this. Thanks, Dhruba.

          People

          • Assignee:
            dhruba borthakur
            Reporter:
            Owen O'Malley
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development