Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12930

[Umbrella] Dynamic subcommands for hadoop shell scripts

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha1
    • Fix Version/s: 3.0.0-alpha1
    • Component/s: scripts
    • Labels:
      None
    • Target Version/s:
    • Release Note:
      It is now possible to add or modify the behavior of existing subcommands in the hadoop, hdfs, mapred, and yarn scripts. See the Unix Shell Guide for more information.
    • Flags:
      Important

      Description

      Umbrella for converting hadoop, hdfs, mapred, and yarn to allow for dynamic subcommands. See first comment for more details.

      1. HADOOP-12930.00.patch
        126 kB
        Allen Wittenauer

        Issue Links

        1.
        bin/hadoop work for dynamic subcommands Sub-task Resolved Allen Wittenauer
         
        2.
        bin/yarn work for dynamic subcommands Sub-task Resolved Allen Wittenauer
         
        3.
        bin/hdfs work for dynamic subcommands Sub-task Resolved Allen Wittenauer
         
        4.
        bin/mapred work for dynamic subcommands Sub-task Resolved Allen Wittenauer
         
        5.
        API documentation for dynamic subcommands Sub-task Resolved Allen Wittenauer
         
        6.
        modify hadoop-tools to take advantage of dynamic subcommands Sub-task Resolved Allen Wittenauer
         
        7.
        enable daemonization of dynamic commands Sub-task Resolved Allen Wittenauer
         
        8.
        env var doc update for dynamic commands Sub-task Resolved Allen Wittenauer
         
        9.
        fix shellprofiles in hadoop-tools to allow replacement Sub-task Resolved Allen Wittenauer
         
        10.
        hadoop distcp adds client opts twice when dynamic Sub-task Resolved Allen Wittenauer
         
        11.
        hadoop-common unit tests for dynamic commands Sub-task Resolved Allen Wittenauer
         
        12.
        hadoop-hdfs unit tests for dynamic commands Sub-task Resolved Allen Wittenauer
         
        13.
        clean up how rumen is executed Sub-task Resolved Allen Wittenauer
         
        14.
        dynamic subcommands need a way to manipulate arguments Sub-task Resolved Allen Wittenauer
         
        15.
        add a streaming subcommand to mapred Sub-task Resolved Allen Wittenauer
         
        16.
        convert hadoop gridmix to be dynamic Sub-task Resolved Allen Wittenauer
         
        17.
        dynamic subcommand docs should talk about exit vs. continue program flow Sub-task Resolved Allen Wittenauer
         
        18.
        clarify daemonization and security vars for dynamic commands Sub-task Resolved Allen Wittenauer
         
        19.
        add a --debug message when dynamic commands have been used Sub-task Resolved Allen Wittenauer
         
        20.
        rename sub-project shellprofiles to match the rest of Hadoop Sub-task Resolved Allen Wittenauer
         
        21.
        fix typo in dynamic subcommand docs Sub-task Resolved Allen Wittenauer
         
        22.
        fix typo in debug statement for dynamic subcommands Sub-task Resolved Allen Wittenauer
         
        23.
        Underscores should be escaped in dynamic subcommands document Sub-task Resolved Allen Wittenauer
         
        24.
        Fix dynamic subcommands error on multiple arguments Sub-task Resolved Masatake Iwasaki
         

          Activity

          Hide
          aw Allen Wittenauer added a comment - - edited

          It is extremely desirable to be able to add subcommands to the hadoop, etc, commands dynamically. There are several reasons to do this:

          • Enable local variants of subcommands without worrying about breaking existing scripts. For example, distcp has historically been replaced with local versions for various reasons.
          • Allows for greater testing capabilities
          • Possibility of 3rd party/external-to-hadoop being allowed to add capabilities and take advantage of the rich shell environment

          Enabling this is relatively trivial:

          • look for a function defined with a given pattern that matches the subcommand passed via the CLI
          • if that function exists, execute it with passed parameters
          • if that function doesn't exist, continue on with our normal processing

          In order to accomplish this, a few things need to take place:

          • re-arrange the existing commands a bit for legibility/flexibility
          • make some shell-local globals to be 'safe' globals that can span past their borders
          • define an API by which 3rd parties may add and override existing commands
          • get HADOOP-12857 committed for some pre-work
          Show
          aw Allen Wittenauer added a comment - - edited It is extremely desirable to be able to add subcommands to the hadoop, etc, commands dynamically. There are several reasons to do this: Enable local variants of subcommands without worrying about breaking existing scripts. For example, distcp has historically been replaced with local versions for various reasons. Allows for greater testing capabilities Possibility of 3rd party/external-to-hadoop being allowed to add capabilities and take advantage of the rich shell environment Enabling this is relatively trivial: look for a function defined with a given pattern that matches the subcommand passed via the CLI if that function exists, execute it with passed parameters if that function doesn't exist, continue on with our normal processing In order to accomplish this, a few things need to take place: re-arrange the existing commands a bit for legibility/flexibility make some shell-local globals to be 'safe' globals that can span past their borders define an API by which 3rd parties may add and override existing commands get HADOOP-12857 committed for some pre-work
          Hide
          aw Allen Wittenauer added a comment -

          rebased HADOOP-12930 with trunk

          Show
          aw Allen Wittenauer added a comment - rebased HADOOP-12930 with trunk
          Hide
          aw Allen Wittenauer added a comment -

          FYI: this branch is now usable and includes some examples now that hadoop-tools is using the functionality to enable distcp, etc.

          Show
          aw Allen Wittenauer added a comment - FYI: this branch is now usable and includes some examples now that hadoop-tools is using the functionality to enable distcp, etc.
          Hide
          aw Allen Wittenauer added a comment -

          rebased to trunk.

          Other than what to do about the unit tests for yarn and mapreduce shell script code, I think this branch is pretty much ready for merging.

          Show
          aw Allen Wittenauer added a comment - rebased to trunk. Other than what to do about the unit tests for yarn and mapreduce shell script code, I think this branch is pretty much ready for merging.
          Hide
          aw Allen Wittenauer added a comment -

          Just to copy what I sent to common-dev here:

          ===

          When the sub-projects re-merged, maven work was done, whatever, the shell scripts for MR and YARN were placed (effectively) outside of the normal maven hierarchy. In order to add unit tests to the shell scripts for these sub-projects, it means effectively turning hadoop-yarn-project/hadoop-yarn and hadoop-mapreduce-project into “real” modules so that mvn test works as expected. Doing so will likely have some surprising consequences, such as anyone who modifies java code and the shell code in a patch will trigger all of the unit tests in yarn.

          I think we have four options:

          a) Continue forward turning these into real modules with src directories, etc and we live with the consequences

          b) Move the related bits into an existing module, making them similar to HDFS, common, tools

          c) Move the related bits into a new module, using the layout that maven really really wants

          d) Skip the unit tests; we don’t have them now

          This is clearly more work than what I really wanted to cover in this branch, but given that there was a specific request to add unit test code for this functionality, I’m sort of stuck here.

          Thoughts?

          ===

          Show
          aw Allen Wittenauer added a comment - Just to copy what I sent to common-dev here: === When the sub-projects re-merged, maven work was done, whatever, the shell scripts for MR and YARN were placed (effectively) outside of the normal maven hierarchy. In order to add unit tests to the shell scripts for these sub-projects, it means effectively turning hadoop-yarn-project/hadoop-yarn and hadoop-mapreduce-project into “real” modules so that mvn test works as expected. Doing so will likely have some surprising consequences, such as anyone who modifies java code and the shell code in a patch will trigger all of the unit tests in yarn. I think we have four options: a) Continue forward turning these into real modules with src directories, etc and we live with the consequences b) Move the related bits into an existing module, making them similar to HDFS, common, tools c) Move the related bits into a new module, using the layout that maven really really wants d) Skip the unit tests; we don’t have them now This is clearly more work than what I really wanted to cover in this branch, but given that there was a specific request to add unit test code for this functionality, I’m sort of stuck here. Thoughts? ===
          Hide
          aw Allen Wittenauer added a comment -

          Rebased after committing HADOOP-12866 to trunk.

          Show
          aw Allen Wittenauer added a comment - Rebased after committing HADOOP-12866 to trunk.
          Hide
          aw Allen Wittenauer added a comment -

          Vote called to merge branch.

          Show
          aw Allen Wittenauer added a comment - Vote called to merge branch.
          Hide
          aw Allen Wittenauer added a comment -

          Rebased with trunk.

          Show
          aw Allen Wittenauer added a comment - Rebased with trunk.
          Hide
          cnauroth Chris Nauroth added a comment -

          Echoing my response on the mailing lists, I am +1 for merging this to trunk.

          Show
          cnauroth Chris Nauroth added a comment - Echoing my response on the mailing lists, I am +1 for merging this to trunk.
          Hide
          aw Allen Wittenauer added a comment -

          rebased to match trunk

          Show
          aw Allen Wittenauer added a comment - rebased to match trunk
          Hide
          aw Allen Wittenauer added a comment -

          Vote has passed. Here's the squashed commit of the sub-tasks that will be committed.

          Thanks everyone!

          -00:

          • squashed commit
          Show
          aw Allen Wittenauer added a comment - Vote has passed. Here's the squashed commit of the sub-tasks that will be committed. Thanks everyone! -00: squashed commit
          Hide
          aw Allen Wittenauer added a comment -

          Committed to trunk

          Show
          aw Allen Wittenauer added a comment - Committed to trunk
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9772 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9772/)
          HADOOP-12930. Dynamic subcommands for hadoop shell scripts (aw) (aw: rev 730bc746f9ac6e045e94dc2bc622b16de0159b4b)

          • hadoop-mapreduce-project/bin/mapred
          • hadoop-assemblies/src/main/resources/assemblies/hadoop-tools.xml
          • hadoop-hdfs-project/hadoop-hdfs/src/test/scripts/hdfs_subcommands.bats
          • hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
          • hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/LoadTypedBytes.java
          • hadoop-tools/hadoop-streaming/src/main/shellprofile.d/hadoop-streaming.sh
          • hadoop-common-project/hadoop-common/src/site/markdown/UnixShellGuide.md
          • hadoop-tools/hadoop-gridmix/src/main/shellprofile.d/hadoop-gridmix.sh
          • hadoop-hdfs-project/hadoop-hdfs/src/main/shellprofile.d/hdfs.sh
          • hadoop-hdfs-project/hadoop-hdfs/src/test/scripts/hdfs-functions_test_helper.bash
          • hadoop-yarn-project/hadoop-yarn/shellprofile.d/hadoop-yarn.sh
          • hadoop-common-project/hadoop-common/src/main/bin/hadoop
          • hadoop-common-project/hadoop-common/src/test/scripts/hadoop-functions_test_helper.bash
          • hadoop-tools/hadoop-streaming/src/site/markdown/HadoopStreaming.md.vm
          • hadoop-hdfs-project/hadoop-hdfs/src/main/shellprofile.d/hadoop-hdfs.sh
          • hadoop-tools/hadoop-extras/src/main/shellprofile.d/hadoop-extras.sh
          • hadoop-hdfs-project/hadoop-hdfs/pom.xml
          • hadoop-common-project/hadoop-common/src/test/scripts/hadoop_subcommands.bats
          • hadoop-mapreduce-project/shellprofile.d/hadoop-mapreduce.sh
          • hadoop-yarn-project/hadoop-yarn/shellprofile.d/yarn.sh
          • hadoop-hdfs-project/hadoop-hdfs/src/test/scripts/run-bats.sh
          • hadoop-tools/hadoop-distcp/src/main/shellprofile.d/hadoop-distcp.sh
          • hadoop-tools/hadoop-rumen/src/main/shellprofile.d/hadoop-rumen.sh
          • hadoop-yarn-project/hadoop-yarn/bin/yarn
          • hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/HadoopStreaming.java
          • hadoop-tools/hadoop-rumen/src/site/markdown/Rumen.md.vm
          • hadoop-tools/hadoop-archives/src/main/shellprofile.d/hadoop-archives.sh
          • hadoop-mapreduce-project/shellprofile.d/mapreduce.sh
          • hadoop-tools/hadoop-archive-logs/src/main/shellprofile.d/hadoop-archive-logs.sh
          • hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/DumpTypedBytes.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9772 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9772/ ) HADOOP-12930 . Dynamic subcommands for hadoop shell scripts (aw) (aw: rev 730bc746f9ac6e045e94dc2bc622b16de0159b4b) hadoop-mapreduce-project/bin/mapred hadoop-assemblies/src/main/resources/assemblies/hadoop-tools.xml hadoop-hdfs-project/hadoop-hdfs/src/test/scripts/hdfs_subcommands.bats hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/LoadTypedBytes.java hadoop-tools/hadoop-streaming/src/main/shellprofile.d/hadoop-streaming.sh hadoop-common-project/hadoop-common/src/site/markdown/UnixShellGuide.md hadoop-tools/hadoop-gridmix/src/main/shellprofile.d/hadoop-gridmix.sh hadoop-hdfs-project/hadoop-hdfs/src/main/shellprofile.d/hdfs.sh hadoop-hdfs-project/hadoop-hdfs/src/test/scripts/hdfs-functions_test_helper.bash hadoop-yarn-project/hadoop-yarn/shellprofile.d/hadoop-yarn.sh hadoop-common-project/hadoop-common/src/main/bin/hadoop hadoop-common-project/hadoop-common/src/test/scripts/hadoop-functions_test_helper.bash hadoop-tools/hadoop-streaming/src/site/markdown/HadoopStreaming.md.vm hadoop-hdfs-project/hadoop-hdfs/src/main/shellprofile.d/hadoop-hdfs.sh hadoop-tools/hadoop-extras/src/main/shellprofile.d/hadoop-extras.sh hadoop-hdfs-project/hadoop-hdfs/pom.xml hadoop-common-project/hadoop-common/src/test/scripts/hadoop_subcommands.bats hadoop-mapreduce-project/shellprofile.d/hadoop-mapreduce.sh hadoop-yarn-project/hadoop-yarn/shellprofile.d/yarn.sh hadoop-hdfs-project/hadoop-hdfs/src/test/scripts/run-bats.sh hadoop-tools/hadoop-distcp/src/main/shellprofile.d/hadoop-distcp.sh hadoop-tools/hadoop-rumen/src/main/shellprofile.d/hadoop-rumen.sh hadoop-yarn-project/hadoop-yarn/bin/yarn hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/HadoopStreaming.java hadoop-tools/hadoop-rumen/src/site/markdown/Rumen.md.vm hadoop-tools/hadoop-archives/src/main/shellprofile.d/hadoop-archives.sh hadoop-mapreduce-project/shellprofile.d/mapreduce.sh hadoop-tools/hadoop-archive-logs/src/main/shellprofile.d/hadoop-archive-logs.sh hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/DumpTypedBytes.java

            People

            • Assignee:
              aw Allen Wittenauer
              Reporter:
              aw Allen Wittenauer
            • Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development