Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha1
    • Fix Version/s: 3.0.0-alpha1
    • Component/s: scripts
    • Labels:

      Description

      From HDFS-7184, minor issues with balancer:

      • daemon isn't set to true in hdfs to enable daemonization
      • start-balancer script has usage instead of hadoop_usage
      1. HDFS-7204.patch
        2 kB
        Allen Wittenauer
      2. HDFS-7204-01.patch
        2 kB
        Allen Wittenauer

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1934 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1934/)
          HDFS-7204. balancer doesn't run as a daemon (aw) (aw: rev 4baca311ffb5489fbbe08288502db68875834920)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/bin/stop-balancer.sh
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
          • hadoop-hdfs-project/hadoop-hdfs/src/main/bin/start-balancer.sh
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1934 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1934/ ) HDFS-7204 . balancer doesn't run as a daemon (aw) (aw: rev 4baca311ffb5489fbbe08288502db68875834920) hadoop-hdfs-project/hadoop-hdfs/src/main/bin/stop-balancer.sh hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs hadoop-hdfs-project/hadoop-hdfs/src/main/bin/start-balancer.sh
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #1909 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1909/)
          HDFS-7204. balancer doesn't run as a daemon (aw) (aw: rev 4baca311ffb5489fbbe08288502db68875834920)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/bin/stop-balancer.sh
          • hadoop-hdfs-project/hadoop-hdfs/src/main/bin/start-balancer.sh
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1909 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1909/ ) HDFS-7204 . balancer doesn't run as a daemon (aw) (aw: rev 4baca311ffb5489fbbe08288502db68875834920) hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/bin/stop-balancer.sh hadoop-hdfs-project/hadoop-hdfs/src/main/bin/start-balancer.sh
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #720 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/720/)
          HDFS-7204. balancer doesn't run as a daemon (aw) (aw: rev 4baca311ffb5489fbbe08288502db68875834920)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/bin/stop-balancer.sh
          • hadoop-hdfs-project/hadoop-hdfs/src/main/bin/start-balancer.sh
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #720 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/720/ ) HDFS-7204 . balancer doesn't run as a daemon (aw) (aw: rev 4baca311ffb5489fbbe08288502db68875834920) hadoop-hdfs-project/hadoop-hdfs/src/main/bin/stop-balancer.sh hadoop-hdfs-project/hadoop-hdfs/src/main/bin/start-balancer.sh hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #6306 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6306/)
          HDFS-7204. balancer doesn't run as a daemon (aw) (aw: rev 4baca311ffb5489fbbe08288502db68875834920)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
          • hadoop-hdfs-project/hadoop-hdfs/src/main/bin/stop-balancer.sh
          • hadoop-hdfs-project/hadoop-hdfs/src/main/bin/start-balancer.sh
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #6306 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6306/ ) HDFS-7204 . balancer doesn't run as a daemon (aw) (aw: rev 4baca311ffb5489fbbe08288502db68875834920) hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs hadoop-hdfs-project/hadoop-hdfs/src/main/bin/stop-balancer.sh hadoop-hdfs-project/hadoop-hdfs/src/main/bin/start-balancer.sh hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          aw Allen Wittenauer added a comment -

          Committed.

          Thanks for the review!

          Show
          aw Allen Wittenauer added a comment - Committed. Thanks for the review!
          Hide
          benoyantony Benoy Antony added a comment -

          +1

          Show
          benoyantony Benoy Antony added a comment - +1
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks Allen Wittenauer, I created HADOOP-11208.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks Allen Wittenauer , I created HADOOP-11208 .
          Hide
          aw Allen Wittenauer added a comment -

          Maybe we change the variable name "daemon" to something like "run_via_dh" (run via daemon handler) and add a comment like Allen summarized? Thanks.

          Sure. Open a jira under hadoop common to rename daemon and I'll work something up.

          BTW, it's probably worth pointing out that if you look at hadoop-config.sh, you'll see where --daemon is specifically handled.

          Show
          aw Allen Wittenauer added a comment - Maybe we change the variable name "daemon" to something like "run_via_dh" (run via daemon handler) and add a comment like Allen summarized? Thanks. Sure. Open a jira under hadoop common to rename daemon and I'll work something up. BTW, it's probably worth pointing out that if you look at hadoop-config.sh, you'll see where --daemon is specifically handled.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks Benoy Antony for asking and Allen Wittenauer for the detailed explanation of the daemon variable.

          It happens that the same string "daemon" is used both as a variable in the script and as command line switch, and they meant different thing, thus it is a bit confusing. Maybe we change the variable name "daemon" to something like "run_via_dh" (run via daemon handler) and add a comment like Allen summarized? Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks Benoy Antony for asking and Allen Wittenauer for the detailed explanation of the daemon variable. It happens that the same string "daemon" is used both as a variable in the script and as command line switch, and they meant different thing, thus it is a bit confusing. Maybe we change the variable name "daemon" to something like "run_via_dh" (run via daemon handler) and add a comment like Allen summarized? Thanks.
          Hide
          benoyantony Benoy Antony added a comment -

          Thanks for the detailed reply, Allen Wittenauer. I'll update HDFS-7184 accordingly.

          Show
          benoyantony Benoy Antony added a comment - Thanks for the detailed reply, Allen Wittenauer . I'll update HDFS-7184 accordingly.
          Hide
          aw Allen Wittenauer added a comment - - edited

          Benoy Antony, I promise there is a method to the madness.

          TL;DR: No, yes, no.

          Longer:

          In branch-2 and previous, "daemons" were handled via wrapping "standard" command lines. If we concentrate on the functionality (vs. the code rot...) this has some interesting (and inconsistent) results, especially around logging and pid files. If you run the *-daemon version, you got a pid file and hadoop.root.logger is set to be INFO,(something). When a daemon is run in non-daemon mode (e.g., straight up: 'hdfs namenode'), no pid file is generated and hadoop.root.logger is kept as INFO,console. With no pid file generated, it is possible to run, e.g. hdfs namenode, both in *-daemon.sh mode and in straight up mode again. It also means that one needs to pull apart the process list to safely determine the status of the daemon since pid files aren't always created. This made building custom init scripts fraught with danger. This inconsistency has been a point of frustration for many operations teams.

          In branch-3/post-HADOOP-9902, there is a slight change in the above functionality and one of the key reasons why this is an incompatible change. Sub-commands that were intended to run as daemons (either fully, e.g., namenode or partially, e.g. balancer) have all of this handling consolidated, helping to eliminate code rot as well as providing a consistent user experience across projects. daemon=true, which is a per-script local, but is consistent across the hadoop sub-projects, tells the latter parts of the shell code that this sub-command needs to have some extra-handling enabled beyond the normal commands. In particular, daemon=true's will always get pid and out files. They will prevent two being run on the same machine by the same user simultaneously (see footnote 1, however). They get some extra options on the java command line. Etc, etc.

          So where does --daemon come in? The value of that is stored in a global called HADOOP_DAEMON_MODE. If the user doesn't set it specifically, it defaults to 'default'. This was done to allow the code to mostly replicate the behavior of branch-2 and previous when the *-daemon.sh code was NOT used. In other words, --daemon default (or no value provided), let's commands like hdfs namenode still run in the foreground, just now with pid and out files. --daemon start does the disown (previously a nohup), change the logging output from HADOOP_ROOT_LOGGER to HADOOP_DAEMON_ROOT_LOGGER, add some extra command line options, etc, etc similar to the *-daemon.sh commands.

          What happens if daemon mode is set for all commands? The big thing is the pid and out file creation and the checks around it. A user would only ever be able to execute one 'hadoop fs' command at a time because of the pid file! Less than ideal.

          To summarize, daemon=true tells the code that --daemon actually means something to the sub-command. Otherwise, --daemon is ignored.

          1-... unless HADOOP_IDENT_STRING is modified appropriately. This means that in branch-3, it is now possible to run two secure datanodes on the same machine as the same user, since all of the logs, pids, and outs, take that into consideration! QA folks should be very happy.

          Show
          aw Allen Wittenauer added a comment - - edited Benoy Antony , I promise there is a method to the madness. TL;DR: No, yes, no. Longer: In branch-2 and previous, "daemons" were handled via wrapping "standard" command lines. If we concentrate on the functionality (vs. the code rot...) this has some interesting (and inconsistent) results, especially around logging and pid files. If you run the *-daemon version, you got a pid file and hadoop.root.logger is set to be INFO,(something). When a daemon is run in non-daemon mode (e.g., straight up: 'hdfs namenode'), no pid file is generated and hadoop.root.logger is kept as INFO,console. With no pid file generated, it is possible to run, e.g. hdfs namenode, both in *-daemon.sh mode and in straight up mode again. It also means that one needs to pull apart the process list to safely determine the status of the daemon since pid files aren't always created. This made building custom init scripts fraught with danger. This inconsistency has been a point of frustration for many operations teams. In branch-3/post- HADOOP-9902 , there is a slight change in the above functionality and one of the key reasons why this is an incompatible change. Sub-commands that were intended to run as daemons (either fully, e.g., namenode or partially, e.g. balancer) have all of this handling consolidated, helping to eliminate code rot as well as providing a consistent user experience across projects. daemon=true, which is a per-script local, but is consistent across the hadoop sub-projects, tells the latter parts of the shell code that this sub-command needs to have some extra-handling enabled beyond the normal commands. In particular, daemon=true's will always get pid and out files. They will prevent two being run on the same machine by the same user simultaneously (see footnote 1, however). They get some extra options on the java command line. Etc, etc. So where does --daemon come in? The value of that is stored in a global called HADOOP_DAEMON_MODE. If the user doesn't set it specifically, it defaults to 'default'. This was done to allow the code to mostly replicate the behavior of branch-2 and previous when the *-daemon.sh code was NOT used. In other words, --daemon default (or no value provided), let's commands like hdfs namenode still run in the foreground, just now with pid and out files. --daemon start does the disown (previously a nohup), change the logging output from HADOOP_ROOT_LOGGER to HADOOP_DAEMON_ROOT_LOGGER, add some extra command line options, etc, etc similar to the *-daemon.sh commands. What happens if daemon mode is set for all commands? The big thing is the pid and out file creation and the checks around it. A user would only ever be able to execute one 'hadoop fs' command at a time because of the pid file! Less than ideal. To summarize, daemon=true tells the code that --daemon actually means something to the sub-command. Otherwise, --daemon is ignored. 1-... unless HADOOP_IDENT_STRING is modified appropriately. This means that in branch-3, it is now possible to run two secure datanodes on the same machine as the same user, since all of the logs, pids, and outs, take that into consideration! QA folks should be very happy.
          Hide
          benoyantony Benoy Antony added a comment -

          Allen Wittenauer ,
          Can we avoid setting daemon=true for each component like balancer ?
          I may be missing the intention behind the internal boolean variable - daemon.
          Shouldn't it be set based on whether --daemon is part of the invocation ?

          Show
          benoyantony Benoy Antony added a comment - Allen Wittenauer , Can we avoid setting daemon=true for each component like balancer ? I may be missing the intention behind the internal boolean variable - daemon . Shouldn't it be set based on whether --daemon is part of the invocation ?
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Allen Wittenauer, thanks for the explanation, it's very helpful. Good work on the shell rewrite!

          Show
          yzhangal Yongjun Zhang added a comment - Hi Allen Wittenauer , thanks for the explanation, it's very helpful. Good work on the shell rewrite!
          Hide
          aw Allen Wittenauer added a comment -

          (It's probably worth pointing out that this is one of the big benefits of the shell rewrite. Prior to 9902, if you wanted to turn a subcommand into a daemon, you either had to know the magic incantation to give hadoop-daemon.sh, hack on one of the existing sbin scripts, or build a custom sbin script such as was done for balancer. In trunk, turning into a daemon is just one line in the appropriate script: daemon="true". Case in point, adding that to hdfs' balancer case statement does 99% of the work in this issue. The rest is just minor fixing of backward compatibility issues! This means that for HDFS-7184, it's literally a single line fix.)

          Show
          aw Allen Wittenauer added a comment - (It's probably worth pointing out that this is one of the big benefits of the shell rewrite. Prior to 9902, if you wanted to turn a subcommand into a daemon, you either had to know the magic incantation to give hadoop-daemon.sh, hack on one of the existing sbin scripts, or build a custom sbin script such as was done for balancer. In trunk, turning into a daemon is just one line in the appropriate script: daemon="true". Case in point, adding that to hdfs' balancer case statement does 99% of the work in this issue. The rest is just minor fixing of backward compatibility issues! This means that for HDFS-7184 , it's literally a single line fix.)
          Hide
          aw Allen Wittenauer added a comment -

          TL;DR: no.

          Longer:

          In branch-2 and earlier, there are the sbin/start-balancer.sh and sbin/stop-balancer.sh. These commands basically turn balancer into a background process that, like when you run it in interactive mode, runs until completion. The difference is that the output is sent to a log file. These commands do this by running the balancer underneath hadoop-daemon.sh.

          Post-HADOOP-9902, however, hadoop-daemon.sh has been neutered, moving all of its functionality into hadoop-functions.sh with bin/hdfs, bin/mapred, bin/yarn, etc. These scripts have some glue that triggers the necessarily bits to run stuff into this background mode. When 9902 was committed, running the balancer in the previous way was broken (completely my fault!). This restores that functionality. Also, since it is running in this background/daemon mode, it also gets a working pid file and the ability to do hdfs --daemon status to see if the balancer is still running.

          To get the functionality that you are talking about would require much more extensive changes. Most people with large enough setups that want to run balancer "permanently" tend to just run it from cron or whatever.

          Show
          aw Allen Wittenauer added a comment - TL;DR: no. Longer: In branch-2 and earlier, there are the sbin/start-balancer.sh and sbin/stop-balancer.sh. These commands basically turn balancer into a background process that, like when you run it in interactive mode, runs until completion. The difference is that the output is sent to a log file. These commands do this by running the balancer underneath hadoop-daemon.sh. Post- HADOOP-9902 , however, hadoop-daemon.sh has been neutered, moving all of its functionality into hadoop-functions.sh with bin/hdfs, bin/mapred, bin/yarn, etc. These scripts have some glue that triggers the necessarily bits to run stuff into this background mode. When 9902 was committed, running the balancer in the previous way was broken (completely my fault!). This restores that functionality. Also, since it is running in this background/daemon mode, it also gets a working pid file and the ability to do hdfs --daemon status to see if the balancer is still running. To get the functionality that you are talking about would require much more extensive changes. Most people with large enough setups that want to run balancer "permanently" tend to just run it from cron or whatever.
          Hide
          yzhangal Yongjun Zhang added a comment -

          HI Allen Wittenauer,

          Thanks for your work on this. I have a question about running balancer as daemon. User used to run the balancer at command line, and wait for it to finish (it finishes when balancing criteria is satisfied). Now it can be run as a daemon, does it mean the balancer daemon continues to monitor the HDFS state and automatically start the balancing process when detecting that the balancing criteria is not satisfied? thanks.

          Show
          yzhangal Yongjun Zhang added a comment - HI Allen Wittenauer , Thanks for your work on this. I have a question about running balancer as daemon. User used to run the balancer at command line, and wait for it to finish (it finishes when balancing criteria is satisfied). Now it can be run as a daemon, does it mean the balancer daemon continues to monitor the HDFS state and automatically start the balancing process when detecting that the balancing criteria is not satisfied? thanks.
          Hide
          aw Allen Wittenauer added a comment -

          Manual run gives the expected:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          Show
          aw Allen Wittenauer added a comment - Manual run gives the expected: -1 overall . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version ) warnings.
          Hide
          aw Allen Wittenauer added a comment -

          Umm, I think Jenkins has gone nuts... because none of that makes any sense.

          Show
          aw Allen Wittenauer added a comment - Umm, I think Jenkins has gone nuts... because none of that makes any sense.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12673713/HDFS-7204-01.patch
          against trunk revision 2217e2f.

          -1 @author. The patch appears to contain @author tags which the Hadoop community has agreed to not allow in code contributions.

          +1 tests included. The patch appears to include new or modified test files.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8354//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673713/HDFS-7204-01.patch against trunk revision 2217e2f. -1 @author . The patch appears to contain @author tags which the Hadoop community has agreed to not allow in code contributions. +1 tests included . The patch appears to include new or modified test files. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8354//console This message is automatically generated.
          Hide
          aw Allen Wittenauer added a comment -

          -01:

          • Another minor fix: exec hdfs in stop-balancer.
          Show
          aw Allen Wittenauer added a comment - -01: Another minor fix: exec hdfs in stop-balancer.
          Hide
          aw Allen Wittenauer added a comment -

          -00:

          • Enables daemonization for balancer such that hdfs --daemon start balancer works
          • Fixes usage message for sbin/start-balancer.sh
          • Removes the legacy hadoop-daemon.sh middleman; calls bin/hdfs directly
          Show
          aw Allen Wittenauer added a comment - -00: Enables daemonization for balancer such that hdfs --daemon start balancer works Fixes usage message for sbin/start-balancer.sh Removes the legacy hadoop-daemon.sh middleman; calls bin/hdfs directly

            People

            • Assignee:
              aw Allen Wittenauer
              Reporter:
              aw Allen Wittenauer
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development