Hadoop Common
  1. Hadoop Common
  2. HADOOP-7939

Improve Hadoop subcomponent integration in Hadoop 0.23

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.1
    • Component/s: build, conf, documentation, scripts
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Introduction

      For the rest of this proposal it is assumed that the current set
      of Hadoop subcomponents is:

      • hadoop-common
      • hadoop-hdfs
      • hadoop-yarn
      • hadoop-mapreduce

      It must be noted that this is an open ended list, though. For example,
      implementations of additional frameworks on top of yarn (e.g. MPI) would
      also be considered a subcomponent.

      Problem statement

      Currently there's an unfortunate coupling and hard-coding present at the
      level of launcher scripts, configuration scripts and Java implementation
      code that prevents us from treating all subcomponents of Hadoop independently
      of each other. In a lot of places it is assumed that bits and pieces
      from individual subcomponents must be located at predefined places
      and they can not be dynamically registered/discovered during the runtime.
      This prevents a truly flexible deployment of Hadoop 0.23.

      Proposal

      NOTE: this is NOT a proposal for redefining the layout from HADOOP-6255.
      The goal here is to keep as much of that layout in place as possible,
      while permitting different deployment layouts.

      The aim of this proposal is to introduce the needed level of indirection and
      flexibility in order to accommodate the current assumed layout of Hadoop tarball
      deployments and all the other styles of deployments as well. To this end the
      following set of environment variables needs to be uniformly used in all of
      the subcomponent's launcher scripts, configuration scripts and Java code
      (<SC> stands for a literal name of a subcomponent). These variables are
      expected to be defined by <SC>-env.sh scripts and sourcing those files is
      expected to have the desired effect of setting the environment up correctly.

      1. HADOOP_<SC>_HOME
        1. root of the subtree in a filesystem where a subcomponent is expected to be installed
        2. default value: $0/..
      2. HADOOP_<SC>_JARS
        1. a subdirectory with all of the jar files comprising subcomponent's implementation
        2. default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)
      3. HADOOP_<SC>_EXT_JARS
        1. a subdirectory with all of the jar files needed for extended functionality of the subcomponent (nonessential for correct work of the basic functionality)
        2. default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/ext
      4. HADOOP_<SC>_NATIVE_LIBS
        1. a subdirectory with all the native libraries that component requires
        2. default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/native
      5. HADOOP_<SC>_BIN
        1. a subdirectory with all of the launcher scripts specific to the client side of the component
        2. default value: $(HADOOP_<SC>_HOME)/bin
      6. HADOOP_<SC>_SBIN
        1. a subdirectory with all of the launcher scripts specific to the server/system side of the component
        2. default value: $(HADOOP_<SC>_HOME)/sbin
      7. HADOOP_<SC>_LIBEXEC
        1. a subdirectory with all of the launcher scripts that are internal to the implementation and should not be invoked directly
        2. default value: $(HADOOP_<SC>_HOME)/libexec
      8. HADOOP_<SC>_CONF
        1. a subdirectory containing configuration files for a subcomponent
        2. default value: $(HADOOP_<SC>_HOME)/conf
      9. HADOOP_<SC>_DATA
        1. a subtree in the local filesystem for storing component's persistent state
        2. default value: $(HADOOP_<SC>_HOME)/data
      10. HADOOP_<SC>_LOG
        1. a subdirectory for subcomponents's log files to be stored
        2. default value: $(HADOOP_<SC>_HOME)/log
      11. HADOOP_<SC>_RUN
        1. a subdirectory with runtime system specific information
        2. default value: $(HADOOP_<SC>_HOME)/run
      12. HADOOP_<SC>_TMP
        1. a subdirectory with temprorary files
        2. default value: $(HADOOP_<SC>_HOME)/tmp
      1. HADOOP-7939-simplified-3.patch.txt
        9 kB
        Roman Shaposhnik
      2. HADOOP-7939-simplified-2.patch.txt
        9 kB
        Roman Shaposhnik
      3. HADOOP-7939-simplified.patch.txt
        9 kB
        Roman Shaposhnik
      4. hadoop-layout.sh
        0.9 kB
        Roman Shaposhnik
      5. HADOOP-7939.patch.txt
        30 kB
        Roman Shaposhnik

        Issue Links

          Activity

          Roman Shaposhnik created issue -
          Hide
          Allen Wittenauer added a comment -

          In Hadoop 0.20.2 (and previous), if one changed the shell scripts to use BASH_SOURCE instead of just a raw $0, one didn't need to set in an environment variable to run hadoop at all. We started to make headway to fixing all these scripts... But I still set HADOOP_HOME just in case something out there didn't work.

          In Hadoop 0.20.

          In Hadoop 0.20.205, HADOOP_HOME was deprecated in favor of HADOOP_PREFIX. I'm not sure what the difference is between the two, but I'm sure the extra character makes it better some how. In the end, I set HADOOP_PREFIX=$HADOOP_HOME, filled HADOOP_HOME_WARNING_SUPPRESS with various social commentary and was on my way.

          This jira wants to introduce a cacophony of environment variables which makes me question what the actual value is going to be to the end user. I suspect the answer is zero. We seem to have this idea, fascination really, that the Hadoop components are magically separate from each other and that somehow people in the real world deploy different versions of these things on the same grid.

          (I know that one of the supposed benefits of yarn is that people could run different versions of MapReduce. This reminds me of the fact that part of my family is from Missouri. For as long as we have unstable and/or private interfaces and an RPC system that isn't forward and backward compatible, this just isn't a reality.)

          But I still don't understand why we need environment variables to do any of this at all. A decade or so ago, some bright folks got the idea that you could write a xyz-config program that took command line arguments that returned various configuration and build-time options. Eventually this germinated into pkg-config, which provides a somewhat app-independent way to do the same thing. While I'm not necessarily advocating full blown pkg-config support, I do think that adding 12 env vars per component is .... less than ideal. When it comes down to it, everyone is just going to end up back at $HADOOP_HOME with various directories appended.

          Show
          Allen Wittenauer added a comment - In Hadoop 0.20.2 (and previous), if one changed the shell scripts to use BASH_SOURCE instead of just a raw $0, one didn't need to set in an environment variable to run hadoop at all. We started to make headway to fixing all these scripts... But I still set HADOOP_HOME just in case something out there didn't work. In Hadoop 0.20. In Hadoop 0.20.205, HADOOP_HOME was deprecated in favor of HADOOP_PREFIX. I'm not sure what the difference is between the two, but I'm sure the extra character makes it better some how. In the end, I set HADOOP_PREFIX=$HADOOP_HOME, filled HADOOP_HOME_WARNING_SUPPRESS with various social commentary and was on my way. This jira wants to introduce a cacophony of environment variables which makes me question what the actual value is going to be to the end user. I suspect the answer is zero. We seem to have this idea, fascination really, that the Hadoop components are magically separate from each other and that somehow people in the real world deploy different versions of these things on the same grid. (I know that one of the supposed benefits of yarn is that people could run different versions of MapReduce. This reminds me of the fact that part of my family is from Missouri. For as long as we have unstable and/or private interfaces and an RPC system that isn't forward and backward compatible, this just isn't a reality.) But I still don't understand why we need environment variables to do any of this at all. A decade or so ago, some bright folks got the idea that you could write a xyz-config program that took command line arguments that returned various configuration and build-time options. Eventually this germinated into pkg-config, which provides a somewhat app-independent way to do the same thing. While I'm not necessarily advocating full blown pkg-config support, I do think that adding 12 env vars per component is .... less than ideal. When it comes down to it, everyone is just going to end up back at $HADOOP_HOME with various directories appended.
          Hide
          eric baldeschwieler added a comment -

          I agree with Allen that adding all of these environment variables seems like a step the wrong way in terms of system manageability.

          The goal of having a flexible way of organizing hadoop components on disk and lashing them together seems admirable. Maybe there is another way to achieve that?

          What use cases motivate this anyway?

          Also, to allen's point, the components are not always separable. In a number of cases we might be better off simplifying the system by integrating them more tightly...

          Show
          eric baldeschwieler added a comment - I agree with Allen that adding all of these environment variables seems like a step the wrong way in terms of system manageability. The goal of having a flexible way of organizing hadoop components on disk and lashing them together seems admirable. Maybe there is another way to achieve that? What use cases motivate this anyway? Also, to allen's point, the components are not always separable. In a number of cases we might be better off simplifying the system by integrating them more tightly...
          Hide
          Roman Shaposhnik added a comment -

          @Allen,

          first of all, I think there's a chicken and egg problem here. You're [somewhat] correct in saying that right now the state of separation between Hadoop components is not ideal. That said, I think it is unfair to use it as an excuse for NOT working on features that would help clean separation to happen down the road.

          Personally, I'm operating under the assumption that clean separation between these 4 parts of Hadoop is desirable. Please let me know if you believe I'm mistaken.

          Now, once we agree on that, the next question is implementation. You are right that a cornucopia of env. variables is NOT an ideal solution. The trouble is – we've already got pretty much as many of then in the current scripts with all sorts of differences in semantics that would prevent if straightforward configuration. Worse yet, we've got code like this (ApplicationConstants.java)

            public static final String[] APPLICATION_CLASSPATH =
                new String[] {
                  "$HADOOP_CONF_DIR",
                  "$HADOOP_COMMON_HOME/share/hadoop/common/*",
                  "$HADOOP_COMMON_HOME/share/hadoop/common/lib/*",
                  "$HADOOP_HDFS_HOME/share/hadoop/hdfs/*",
                  "$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*",
                  "$YARN_HOME/share/hadoop/mapreduce/*",
                  "$YARN_HOME/share/hadoop/mapreduce/lib/*"
                };
          

          which only makes sense in tarball deployment scenario and very limited packaging. The fact that 'share/hadoop' is hardcoded all over the place was one of the main motivations for this JIRA. If we want YARN to be framework agnostic, all of the class path elements from the code I quoted above need to be tweakable (as in – not force me to create dummy symlinks that end with share/hadoop/* just so that the code is happy). I think we can agree on that much.

          If all of the above makes sense to you, I think the final question to be answered is whether to go the route of env. variables or pkg-config type of system. Essentially what I'm suggesting here is the first step to pkg-config (with proposed <component>-env.sh scripts which we can rename if we want to). The number
          of variables is somewhat large, but so is the number of deployment ascpets that need to be configured (e.g. I can no longer assume that the log files for
          Hadoop should be in $HADOOP_HOME/logs or even /var/log/hadoop for that matter, etc.). If any single one of the vars jumps at you as redundant, please let me know.

          Thanks,
          Roman.

          Show
          Roman Shaposhnik added a comment - @Allen, first of all, I think there's a chicken and egg problem here. You're [somewhat] correct in saying that right now the state of separation between Hadoop components is not ideal. That said, I think it is unfair to use it as an excuse for NOT working on features that would help clean separation to happen down the road. Personally, I'm operating under the assumption that clean separation between these 4 parts of Hadoop is desirable. Please let me know if you believe I'm mistaken. Now, once we agree on that, the next question is implementation. You are right that a cornucopia of env. variables is NOT an ideal solution. The trouble is – we've already got pretty much as many of then in the current scripts with all sorts of differences in semantics that would prevent if straightforward configuration. Worse yet, we've got code like this (ApplicationConstants.java) public static final String[] APPLICATION_CLASSPATH = new String[] { "$HADOOP_CONF_DIR", "$HADOOP_COMMON_HOME/share/hadoop/common/*", "$HADOOP_COMMON_HOME/share/hadoop/common/lib/*", "$HADOOP_HDFS_HOME/share/hadoop/hdfs/*", "$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*", "$YARN_HOME/share/hadoop/mapreduce/*", "$YARN_HOME/share/hadoop/mapreduce/lib/*" }; which only makes sense in tarball deployment scenario and very limited packaging. The fact that 'share/hadoop' is hardcoded all over the place was one of the main motivations for this JIRA. If we want YARN to be framework agnostic, all of the class path elements from the code I quoted above need to be tweakable (as in – not force me to create dummy symlinks that end with share/hadoop/* just so that the code is happy). I think we can agree on that much. If all of the above makes sense to you, I think the final question to be answered is whether to go the route of env. variables or pkg-config type of system. Essentially what I'm suggesting here is the first step to pkg-config (with proposed <component>-env.sh scripts which we can rename if we want to). The number of variables is somewhat large, but so is the number of deployment ascpets that need to be configured (e.g. I can no longer assume that the log files for Hadoop should be in $HADOOP_HOME/logs or even /var/log/hadoop for that matter, etc.). If any single one of the vars jumps at you as redundant, please let me know. Thanks, Roman.
          Hide
          eric baldeschwieler added a comment -

          What would it mean to make YARN framework agnostic?

          Can we work on a proposal for a set of conventions for how a single hadoop component lays out its parts and how it wires in other components?

          Something like Alejandro's old tools proposal would make a lot more sense than just exposing a mass of env vars.

          Again, this would be easier to understand if motivated by some
          Real world examples of how users lives would be made easier by this


          E14 - typing on glass

          Show
          eric baldeschwieler added a comment - What would it mean to make YARN framework agnostic? Can we work on a proposal for a set of conventions for how a single hadoop component lays out its parts and how it wires in other components? Something like Alejandro's old tools proposal would make a lot more sense than just exposing a mass of env vars. Again, this would be easier to understand if motivated by some Real world examples of how users lives would be made easier by this — E14 - typing on glass
          Hide
          Roman Shaposhnik added a comment -

          @Eric,

          What would it mean to make YARN framework agnostic?

          it would mean making YARN rely on configuration instead of explicit knowledge of
          where exactly each framework keeps its jar files and other bits.

          Can we work on a proposal for a set of conventions for how a single hadoop
          component lays out its parts and how it wires in other components?

          I thought that ship has sailed with HADOOP-6255. This JIRA basically makes it possible
          to have the type of deployment that HADOOP-6255 implemented, but also have other types
          of deployments as well.

          It is dangerous to always assume that things are under the same root, simply because
          in a lot of cases that common root end up being /. E.g. even if we all agree that
          jar files are always located under $

          {HADOOP_<SC>_HOME}

          /jar we can't have that same
          agreement extend to logs, pids, etc for a simple reason that they are bound to be
          under /var or /mnt or /data in a lot of deployment scenarios.

          Again, this would be easier to understand if motivated by some real world examples of how users
          lives would be made easier by this

          I thought I gave at least one example in my reply to Allen: the YARN constants force me to
          create 2/3 level of symbolic links just to satisfy the layout requirements. You can see more
          real world deployment scenarios that motivated this JIRA over here: BIGTOP-316

          Hope this helps.

          Show
          Roman Shaposhnik added a comment - @Eric, What would it mean to make YARN framework agnostic? it would mean making YARN rely on configuration instead of explicit knowledge of where exactly each framework keeps its jar files and other bits. Can we work on a proposal for a set of conventions for how a single hadoop component lays out its parts and how it wires in other components? I thought that ship has sailed with HADOOP-6255 . This JIRA basically makes it possible to have the type of deployment that HADOOP-6255 implemented, but also have other types of deployments as well. It is dangerous to always assume that things are under the same root, simply because in a lot of cases that common root end up being /. E.g. even if we all agree that jar files are always located under $ {HADOOP_<SC>_HOME} /jar we can't have that same agreement extend to logs, pids, etc for a simple reason that they are bound to be under /var or /mnt or /data in a lot of deployment scenarios. Again, this would be easier to understand if motivated by some real world examples of how users lives would be made easier by this I thought I gave at least one example in my reply to Allen: the YARN constants force me to create 2/3 level of symbolic links just to satisfy the layout requirements. You can see more real world deployment scenarios that motivated this JIRA over here: BIGTOP-316 Hope this helps.
          Hide
          eric baldeschwieler added a comment -

          Sounds like some specific work on YARN and how it interacts with legacy MR and new AMs could address your specific example.

          Show
          eric baldeschwieler added a comment - Sounds like some specific work on YARN and how it interacts with legacy MR and new AMs could address your specific example.
          Hide
          Arun C Murthy added a comment -
          What would it mean to make YARN framework agnostic?
          it would mean making YARN rely on configuration instead of explicit knowledge of
          where exactly each framework keeps its jar files and other bits.

          Roman - YARN doens't on explicit knowledge of each framework, ApplicationConstants is merely trying to pass on the 'standard' deps for applications (i.e. hadoop-common, hadoop-hdfs and hadoop-yarn).

          Show
          Arun C Murthy added a comment - What would it mean to make YARN framework agnostic? it would mean making YARN rely on configuration instead of explicit knowledge of where exactly each framework keeps its jar files and other bits. Roman - YARN doens't on explicit knowledge of each framework, ApplicationConstants is merely trying to pass on the 'standard' deps for applications (i.e. hadoop-common, hadoop-hdfs and hadoop-yarn).
          Hide
          Roman Shaposhnik added a comment -

          @Arun,

          No disagreement, expect for the fact that the current implementation of passing on the 'standard' deps for applications is suboptimal since it can not be fully parameterized. That's all I'm saying.

          Show
          Roman Shaposhnik added a comment - @Arun, No disagreement, expect for the fact that the current implementation of passing on the 'standard' deps for applications is suboptimal since it can not be fully parameterized. That's all I'm saying.
          Hide
          Allen Wittenauer added a comment -

          Personally, I'm operating under the assumption that clean separation between these 4 parts of Hadoop is desirable. Please let me know if you believe I'm mistaken.

          I realize this puts me in a very small minority (a position I'm more than acquainted with in the community), but no, I'm not convinced that the separation is needed or even desirable. Even if it was, as implied before, I'm not convinced that tackling the startup scripts is the correct place to begin on making separation actually work.

          FWIW, I'm fully expecting that once I get around to playing with 0.23 and friends that I'll likely be patching it to use plain old $HADOOP_HOME. The already present explosion of env vars post-0.20.205 seems just as equally useless to end users.

          Show
          Allen Wittenauer added a comment - Personally, I'm operating under the assumption that clean separation between these 4 parts of Hadoop is desirable. Please let me know if you believe I'm mistaken. I realize this puts me in a very small minority (a position I'm more than acquainted with in the community), but no, I'm not convinced that the separation is needed or even desirable. Even if it was, as implied before, I'm not convinced that tackling the startup scripts is the correct place to begin on making separation actually work. FWIW, I'm fully expecting that once I get around to playing with 0.23 and friends that I'll likely be patching it to use plain old $HADOOP_HOME. The already present explosion of env vars post-0.20.205 seems just as equally useless to end users.
          Hide
          Eli Collins added a comment -

          How about each component has a single HOME env var and an assumed set of subdirs (logs, pid, tmp, bin, sbin, etc)  and bigtop and friends use symlinks?

          Show
          Eli Collins added a comment - How about each component has a single HOME env var and an assumed set of subdirs (logs, pid, tmp, bin, sbin, etc)  and bigtop and friends use symlinks?
          Hide
          Roman Shaposhnik added a comment -

          @Eli

          Two issues:

          1. Currently assumed subdirectory structure is unnecessary deep with extra share/hadoop/<component> path elements hardcoded
          2. Symbolic links as a means of configuration are restrictive. You can't have different environments having different setting at the same time, etc.

          That said, if the consensus is not to make this kind of configuration possible, you're right – we can simply create a whole bunch of symbolic links faking whatever structure Hadoop components expect.

          Perhaps, I'm expecting too much, but I grew accustomed to my software letting me tweak these types of things however I want without resorting to changing files in what might as well be a R/O file-system (/usr that is).

          Show
          Roman Shaposhnik added a comment - @Eli Two issues: Currently assumed subdirectory structure is unnecessary deep with extra share/hadoop/<component> path elements hardcoded Symbolic links as a means of configuration are restrictive. You can't have different environments having different setting at the same time, etc. That said, if the consensus is not to make this kind of configuration possible, you're right – we can simply create a whole bunch of symbolic links faking whatever structure Hadoop components expect. Perhaps, I'm expecting too much, but I grew accustomed to my software letting me tweak these types of things however I want without resorting to changing files in what might as well be a R/O file-system (/usr that is).
          Hide
          Bruno Mahé added a comment -

          To add to what Roman has said, this proposal does not prevent the use of a default layout. Users would not have to set 30 env variables before getting something usable. This proposal could even maintain the current default layout.
          So this feature would mostly be invisible to end users while allowing others with more complex needs to deploy it in a manner more suitable to their environment.

          Show
          Bruno Mahé added a comment - To add to what Roman has said, this proposal does not prevent the use of a default layout. Users would not have to set 30 env variables before getting something usable. This proposal could even maintain the current default layout. So this feature would mostly be invisible to end users while allowing others with more complex needs to deploy it in a manner more suitable to their environment.
          Hide
          eric baldeschwieler added a comment -

          Hard to understand how adding dozens of things to document and test that add no value in the common case is a good idea. I'm sure there are more elegant ways to achieve the desired result. This seems like a substantial tax for a very small return.

          I'd encourage folks to think of a more elegant solution, ideally one that clearly adds some value to the common case user. Otherwise, the symlink solution seems like proof that we don't need to add code to address uncommon cases.

          Maybe the whole system of how we wire components together can be abstracted at a higher level, giving deployers more interesting degrees of freedom?

          Show
          eric baldeschwieler added a comment - Hard to understand how adding dozens of things to document and test that add no value in the common case is a good idea. I'm sure there are more elegant ways to achieve the desired result. This seems like a substantial tax for a very small return. I'd encourage folks to think of a more elegant solution, ideally one that clearly adds some value to the common case user. Otherwise, the symlink solution seems like proof that we don't need to add code to address uncommon cases. Maybe the whole system of how we wire components together can be abstracted at a higher level, giving deployers more interesting degrees of freedom?
          Hide
          Bruno Mahé added a comment -

          I am not sure to follow. It's not because this may not be visible to end users that this does not add value.
          Furthermore in an ideal case, end users should not have to tweak such values. this is not their responsibility. This is the responsibility of the deployment part.

          Show
          Bruno Mahé added a comment - I am not sure to follow. It's not because this may not be visible to end users that this does not add value. Furthermore in an ideal case, end users should not have to tweak such values. this is not their responsibility. This is the responsibility of the deployment part.
          Hide
          Eli Collins added a comment -

          @Roman, I agree that the current dir structure (share/hadoop/..) is too deep. I also agree with Eric that we can find a much simpler implementation that achieves the goal of allowing the projects to be packaged independently. How about a single level of dirs per project and a single HOME variable per project? Bigtop would only need one symlink per directory, which should be maintainable (eg see below).

          YARN_HOME/bin         # -> /usr/bin/yarn
                   /sbin        # -> /usr/sbin/yarn
                   /conf        # -> /etc/yarn/conf  (or use alternatives)
                   /logs        # -> /var/log/yarn
                   /pids        # -> /var/run/yarn
                   ..
          MAPRED_HOME/bin       ...
                     /sbin
                     /conf
                     /logs
                     ..
          
          Show
          Eli Collins added a comment - @Roman, I agree that the current dir structure (share/hadoop/..) is too deep. I also agree with Eric that we can find a much simpler implementation that achieves the goal of allowing the projects to be packaged independently. How about a single level of dirs per project and a single HOME variable per project? Bigtop would only need one symlink per directory, which should be maintainable (eg see below). YARN_HOME/bin # -> /usr/bin/yarn /sbin # -> /usr/sbin/yarn /conf # -> /etc/yarn/conf (or use alternatives) /logs # -> /var/log/yarn /pids # -> /var/run/yarn .. MAPRED_HOME/bin ... /sbin /conf /logs ..
          Hide
          Roman Shaposhnik added a comment -

          @Eli,

          1. I'm not at all against simplification of the layout. However doing that will effectively mean reverting HADOOP-6255. If you think that's the way to go – let me know and I can file a separate JIRA.
          2. your proposal of layout simplification forces the user to recreate symbolic links every time she needs to change location of things like log files, pids or, in some cases, even persistent data. I tend to view this as a inferior solution compared to a pretty well definite and easy to generalize set of env. variables.
          Show
          Roman Shaposhnik added a comment - @Eli, I'm not at all against simplification of the layout. However doing that will effectively mean reverting HADOOP-6255 . If you think that's the way to go – let me know and I can file a separate JIRA. your proposal of layout simplification forces the user to recreate symbolic links every time she needs to change location of things like log files, pids or, in some cases, even persistent data. I tend to view this as a inferior solution compared to a pretty well definite and easy to generalize set of env. variables.
          Hide
          Roman Shaposhnik added a comment -

          @Eric

          I believe your constant appeal to "complexity" and "un-manageability" of the proposed solution is not only the red herring here, but also serves as a detraction from the current sad state of things. Hadoop scripts already have as many (if not more!) env. variables dedicated to locating various things.
          As one of its goals, this proposal aims at rationalizing the naming convention and clearly documenting what every single one of them does.

          Finally, when you say that there could be a simpler solution, I'm curious, but not hopeful. If you look at an example of any modern Unix packaging/deployment system you'd see that they hall pretty much the same set of knobs to tweak (e.g. read the section of the following RPM
          doc that says Use these macros wherever possible to avoid hard-coded paths and settings: http://docs.fedoraproject.org/en-US/Fedora_Draft_Documentation/0.1/html/RPM_Guide/ch09s07.html ). Even autotools have pretty much the same set of knobs. I think there's a reason
          UNIX community hasn't come up with a simpler implementation in all these years.

          Show
          Roman Shaposhnik added a comment - @Eric I believe your constant appeal to "complexity" and "un-manageability" of the proposed solution is not only the red herring here, but also serves as a detraction from the current sad state of things. Hadoop scripts already have as many (if not more!) env. variables dedicated to locating various things. As one of its goals, this proposal aims at rationalizing the naming convention and clearly documenting what every single one of them does. Finally, when you say that there could be a simpler solution, I'm curious, but not hopeful. If you look at an example of any modern Unix packaging/deployment system you'd see that they hall pretty much the same set of knobs to tweak (e.g. read the section of the following RPM doc that says Use these macros wherever possible to avoid hard-coded paths and settings : http://docs.fedoraproject.org/en-US/Fedora_Draft_Documentation/0.1/html/RPM_Guide/ch09s07.html ). Even autotools have pretty much the same set of knobs. I think there's a reason UNIX community hasn't come up with a simpler implementation in all these years.
          Hide
          eric baldeschwieler added a comment -

          I agree that flexibility of layout is useful. I disagree that putting lots of directory paths in ENV vars will make the system easier to use or manage. It has a small but real cost to implement and maintain and then requires users to get these variables right for the components to work together. This all seems non-ideal.

          If Hadoop's general scheme for file layout is causing problems and needs to be simplified, let's discuss that.

          Show
          eric baldeschwieler added a comment - I agree that flexibility of layout is useful. I disagree that putting lots of directory paths in ENV vars will make the system easier to use or manage. It has a small but real cost to implement and maintain and then requires users to get these variables right for the components to work together. This all seems non-ideal. If Hadoop's general scheme for file layout is causing problems and needs to be simplified, let's discuss that.
          Hide
          Eli Collins added a comment -

          @Roman,
          Simplifying the layout as described is not the same as reverting HADOOP-6255. I agree that symlinks are not optimal, but it seems better than introducing a lot of new environment variables. And if it's an issue for some reason we can keep an explicit environment variable eg for the log dir (ie we don't need to go whole hog).

          Rather than file a new jira, can we do this here? It's just another implementation of the same goal - a Hadoop layout that is (1) simple for users w/o requiring a separate project and (2) supports independent packaging of the projects.

          Show
          Eli Collins added a comment - @Roman, Simplifying the layout as described is not the same as reverting HADOOP-6255 . I agree that symlinks are not optimal, but it seems better than introducing a lot of new environment variables. And if it's an issue for some reason we can keep an explicit environment variable eg for the log dir (ie we don't need to go whole hog). Rather than file a new jira, can we do this here? It's just another implementation of the same goal - a Hadoop layout that is (1) simple for users w/o requiring a separate project and (2) supports independent packaging of the projects.
          Hide
          Roman Shaposhnik added a comment -

          @Eli,

          can you, please, let me know what is wrong with uniformly named environment variables that are used in a very straightforward way in all of the scripts?

          In fact, given the proposed naming convention, we can have a common small bit of code in hadoop-common that will parse the values in these variables and do the substitution regardless of the project that is using them. Imagine this – you will no longer have to maintain a custom code in 4 different projects (and potentially downstream projects like Pig and Hive) just to add *_OPTS to java invocation. Will this not be nice?

          I do not understand this irrational fear that somehow forces us to settle for a half-baked solution (e.g. symlinks). Please enlighten me.

          Show
          Roman Shaposhnik added a comment - @Eli, can you, please, let me know what is wrong with uniformly named environment variables that are used in a very straightforward way in all of the scripts? In fact, given the proposed naming convention, we can have a common small bit of code in hadoop-common that will parse the values in these variables and do the substitution regardless of the project that is using them. Imagine this – you will no longer have to maintain a custom code in 4 different projects (and potentially downstream projects like Pig and Hive) just to add *_OPTS to java invocation. Will this not be nice? I do not understand this irrational fear that somehow forces us to settle for a half-baked solution (e.g. symlinks). Please enlighten me.
          Hide
          Konstantin Boudnik added a comment -

          I guess I share the confusion here: why symlinks are any better then well defined, documented set of variables? So far none of the comments above have pointed out the benefits of the former over the latter. I would love to hear them, if they exist.

          Show
          Konstantin Boudnik added a comment - I guess I share the confusion here: why symlinks are any better then well defined, documented set of variables? So far none of the comments above have pointed out the benefits of the former over the latter. I would love to hear them, if they exist.
          Hide
          eric baldeschwieler added a comment -

          Cos, Roman,

          If one wants to add complexity to the projects, the burden of proof lies with you. So far we've ID a solution that requires zero change and Eli has proposed a change that would reduce the friction of what you want to do.

          I feel harangued, but not educated. I don't understand how your change would improve the world for identified important use cases. I do understand that it would add potential confusion and some maintenance overhead.

          Absent something new added to the discussion, I don't see this as productive.

          E14

          Show
          eric baldeschwieler added a comment - Cos, Roman, If one wants to add complexity to the projects, the burden of proof lies with you. So far we've ID a solution that requires zero change and Eli has proposed a change that would reduce the friction of what you want to do. I feel harangued, but not educated. I don't understand how your change would improve the world for identified important use cases. I do understand that it would add potential confusion and some maintenance overhead. Absent something new added to the discussion, I don't see this as productive. E14
          Hide
          Konstantin Boudnik added a comment -

          Eric,

          I fail to see any solutions being identified and accepted here I think you have a leap of faith here.

          > If one wants to add complexity to the projects, the burden of proof lies with you.
          It works both ways, isn't it?

          There is original proposals and counter one w/ symlinks. I don't see a benefit of the latter and there's no argument to support it. Unfortunately, comments of the sort
          > Absent something new added to the discussion, I don't see this as productive.
          don't add any productivity nor a technical merit into the discussion.

          Show
          Konstantin Boudnik added a comment - Eric, I fail to see any solutions being identified and accepted here I think you have a leap of faith here. > If one wants to add complexity to the projects, the burden of proof lies with you. It works both ways, isn't it? There is original proposals and counter one w/ symlinks. I don't see a benefit of the latter and there's no argument to support it. Unfortunately, comments of the sort > Absent something new added to the discussion, I don't see this as productive. don't add any productivity nor a technical merit into the discussion.
          Hide
          Eli Collins added a comment -

          can you, please, let me know what is wrong with uniformly named environment variables that are used in a very straightforward way in all of the scripts?

          There's nothing wrong with using environment variables, I just don't think we need them.

          The problem statement mentions two issues which are, in my opinion, orthogonal: (1) making it easy to treat all the projects independently and (2) requiring the sub-dirs (logs, pids, bin, etc) be at pre-defined places prevents them from being "dynamically registered/discovered during the runtime".

          Wrt #1, having a HOME variable per-project solves this issue, and IIUC we have that today though the code should be further simplified. Eg replace $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib with just $HDFS_HOME/lib, rather than make it even more configurable. The packaging (here on in Bigtop) can link lib to whatever host-specific location it prefers.

          Wrt #2, I'm not sure we need to dynamically register and discover these locations at runtime. Even if we do, I don't see how hard-coding the location prevents this - alternatives allows for paths to be dynamically registered and discovered using symlinks, should work here too.

          In short, I'd like to see the relevant code in Hadoop be simpler, with fewer variables, and fewer cases to test, which hopefully translates into less time spent maintaining this code and fewer bugs. I think we can do that, while accomplishing #1 for packaging, w/o precluding the packaging from accomplishing #2 if it wants to.

          Show
          Eli Collins added a comment - can you, please, let me know what is wrong with uniformly named environment variables that are used in a very straightforward way in all of the scripts? There's nothing wrong with using environment variables, I just don't think we need them. The problem statement mentions two issues which are, in my opinion, orthogonal: (1) making it easy to treat all the projects independently and (2) requiring the sub-dirs (logs, pids, bin, etc) be at pre-defined places prevents them from being "dynamically registered/discovered during the runtime". Wrt #1, having a HOME variable per-project solves this issue, and IIUC we have that today though the code should be further simplified. Eg replace $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib with just $HDFS_HOME/lib, rather than make it even more configurable. The packaging (here on in Bigtop) can link lib to whatever host-specific location it prefers. Wrt #2, I'm not sure we need to dynamically register and discover these locations at runtime. Even if we do, I don't see how hard-coding the location prevents this - alternatives allows for paths to be dynamically registered and discovered using symlinks, should work here too. In short, I'd like to see the relevant code in Hadoop be simpler, with fewer variables, and fewer cases to test, which hopefully translates into less time spent maintaining this code and fewer bugs. I think we can do that, while accomplishing #1 for packaging, w/o precluding the packaging from accomplishing #2 if it wants to.
          Hide
          Allen Wittenauer added a comment -

          can you, please, let me know what is wrong with uniformly named environment variables that are used in a very straightforward way in all of the scripts?

          Yes, actually, there is a problem with this solution. There is a high risk of overflowing the command buffer on quite a few operating systems, especially if we don't de-dupe things like the classpath or if a user puts in a fully qualified dir path instead of something relative.

          Show
          Allen Wittenauer added a comment - can you, please, let me know what is wrong with uniformly named environment variables that are used in a very straightforward way in all of the scripts? Yes, actually, there is a problem with this solution. There is a high risk of overflowing the command buffer on quite a few operating systems, especially if we don't de-dupe things like the classpath or if a user puts in a fully qualified dir path instead of something relative.
          Hide
          Konstantin Boudnik added a comment -

          There is a high risk of overflowing the command buffer

          Ah, finally a material argument! Thanks Allen - that makes sense.

          Show
          Konstantin Boudnik added a comment - There is a high risk of overflowing the command buffer Ah, finally a material argument! Thanks Allen - that makes sense.
          Hide
          Roman Shaposhnik added a comment -

          @Allen,

          What you're saying makes sense. Two questions though:

          1. Are you saying the proposed solution is worse than the current mess of naming we've got in our scripts (once again, I fully agree with your general point – I'm just curious if it extends to the present situation as well)
          2. Given that a uniformity in naming would allow us to handle all such var evals/manipualtions in a single place (basically having a script that can act as yarn/hdfs/mapred,etc) wouldn't you agree that addressing things like de-dupping will become easier compared to doing such de-dupping in half a dozen different places?
          Show
          Roman Shaposhnik added a comment - @Allen, What you're saying makes sense. Two questions though: Are you saying the proposed solution is worse than the current mess of naming we've got in our scripts (once again, I fully agree with your general point – I'm just curious if it extends to the present situation as well) Given that a uniformity in naming would allow us to handle all such var evals/manipualtions in a single place (basically having a script that can act as yarn/hdfs/mapred,etc) wouldn't you agree that addressing things like de-dupping will become easier compared to doing such de-dupping in half a dozen different places?
          Hide
          Allen Wittenauer added a comment -

          a) Yes, the proposed solution is worse. Some of these env vars are not present and/or less flexible presently which in turn means the command line is smaller since the java code can make some safe assumptions.

          b) It's a toss up. dedup of jars would have to have happen post-expansion. The precedence logic gets a bit crazy when faced with something like /some/path/a/class-v1.jar being different than /some/path/b/class-v2.jar (hai avro) and picking which is the correct one for the task at hand. Additionally, a lot of extra jars are likely to show up that wouldn't be necessary for a given module and could actually get cut from the path. For example (and ignoring the current reality), the mapred layer shouldn't really ever be loading a big chunk of the HDFS jars. By merging these env vars in common code, one pretty much guarantees that such separation between modules never occurs and might even work against that goal.

          Show
          Allen Wittenauer added a comment - a) Yes, the proposed solution is worse. Some of these env vars are not present and/or less flexible presently which in turn means the command line is smaller since the java code can make some safe assumptions. b) It's a toss up. dedup of jars would have to have happen post-expansion. The precedence logic gets a bit crazy when faced with something like /some/path/a/class-v1.jar being different than /some/path/b/class-v2.jar (hai avro) and picking which is the correct one for the task at hand. Additionally, a lot of extra jars are likely to show up that wouldn't be necessary for a given module and could actually get cut from the path. For example (and ignoring the current reality), the mapred layer shouldn't really ever be loading a big chunk of the HDFS jars. By merging these env vars in common code, one pretty much guarantees that such separation between modules never occurs and might even work against that goal.
          Hide
          Roman Shaposhnik added a comment -

          @Allen,

          Some of these env vars are not present and/or less flexible presently which in turn means the command line is smaller since the java code can make some safe assumptions.

          FYI: I'm not sure where you got that impression, but the exec JAVA lines in all the scripts stay pretty much the same. After all the variable expansion, the resulting command line looks exactly like it looks today. In fact, that's been my goal in implementing this proposal – to make sure that the current behavior is not perturbed.

          We already have almost all of the proposed variables, the trouble is that their naming and usage is very inconsistent. That's what this proposal was aiming at correcting.

          Show
          Roman Shaposhnik added a comment - @Allen, Some of these env vars are not present and/or less flexible presently which in turn means the command line is smaller since the java code can make some safe assumptions. FYI: I'm not sure where you got that impression, but the exec JAVA lines in all the scripts stay pretty much the same. After all the variable expansion, the resulting command line looks exactly like it looks today. In fact, that's been my goal in implementing this proposal – to make sure that the current behavior is not perturbed. We already have almost all of the proposed variables, the trouble is that their naming and usage is very inconsistent. That's what this proposal was aiming at correcting.
          Hide
          Roman Shaposhnik added a comment -

          @Eli,

          I strongly disagree with your take on complexity. In fact, while implementing parts of this proposal I was astonished at all of the inconsistencies I've discovered so far in use of already present variables. It is next to impossible to reason about scripts in their current form. To me this is the kind of complexity worth eliminating (some of the HOME variables start with HADOOP, some don't, some arguments setting vars start with subcomponents name, some don't, etc.).

          That said, I'm tired of fighting what seems to be a vocal minority on this JIRA. It is fine if this JIRA gets closed as will-not-fix. As far as Bigtop is concerned we always have an option along the lines that Allen has mentioned – forking and patching the shell scripts to the point of them making sense. In fact, in Bigtop it'll be somewhat easier.

          As to your counter-proposal of changing the default layout to be a simple flat

           ${HADOOP_XX_HOME}/[bin|lib|...]
          

          I do not feel comfortable tackling that at the moment. The way I see it, implemeting
          changes to the layout will require changes in the following areas:

          1. assembly portion of all of the Hadoop subcomponent's Maven builds
          2. RPM/DEB packaging code
          3. config file generation code

          I don't have familiarity with that portion of the Hadoop project (it is all new code) and it will be a significant investment of time for me to get up to speed.

          Feel free to tackle it, though. If it happens – Bigtop's life is going to be a little bit easier.

          Show
          Roman Shaposhnik added a comment - @Eli, I strongly disagree with your take on complexity. In fact, while implementing parts of this proposal I was astonished at all of the inconsistencies I've discovered so far in use of already present variables. It is next to impossible to reason about scripts in their current form. To me this is the kind of complexity worth eliminating (some of the HOME variables start with HADOOP , some don't, some arguments setting vars start with subcomponents name, some don't, etc.). That said, I'm tired of fighting what seems to be a vocal minority on this JIRA. It is fine if this JIRA gets closed as will-not-fix. As far as Bigtop is concerned we always have an option along the lines that Allen has mentioned – forking and patching the shell scripts to the point of them making sense. In fact, in Bigtop it'll be somewhat easier. As to your counter-proposal of changing the default layout to be a simple flat ${HADOOP_XX_HOME}/[bin|lib|...] I do not feel comfortable tackling that at the moment. The way I see it, implemeting changes to the layout will require changes in the following areas: assembly portion of all of the Hadoop subcomponent's Maven builds RPM/DEB packaging code config file generation code I don't have familiarity with that portion of the Hadoop project (it is all new code) and it will be a significant investment of time for me to get up to speed. Feel free to tackle it, though. If it happens – Bigtop's life is going to be a little bit easier.
          Hide
          Eli Collins added a comment -

          In fact, while implementing parts of this proposal I was astonished at all of the inconsistencies I've discovered so far in use of already present variables. It is next to impossible to reason about scripts in their current form. To me this is the kind of complexity worth eliminating (some of the HOME variables start with HADOOP, some don't, some arguments setting vars start with subcomponents name, some don't, etc.).

          I agree, that's why I'm suggesting simplifying the existing HOME* variables instead of introducing new ones. File a jira? Contribute a patch?

          Show
          Eli Collins added a comment - In fact, while implementing parts of this proposal I was astonished at all of the inconsistencies I've discovered so far in use of already present variables. It is next to impossible to reason about scripts in their current form. To me this is the kind of complexity worth eliminating (some of the HOME variables start with HADOOP, some don't, some arguments setting vars start with subcomponents name, some don't, etc.). I agree, that's why I'm suggesting simplifying the existing HOME* variables instead of introducing new ones. File a jira? Contribute a patch?
          Hide
          Roman Shaposhnik added a comment -

          @Eli,

          that work would depend on flattening of the layout which clearly sounds like a separate JIRA to me.

          Show
          Roman Shaposhnik added a comment - @Eli, that work would depend on flattening of the layout which clearly sounds like a separate JIRA to me.
          Hide
          Eli Collins added a comment -

          I don't see how cleaning up the variable names and other inconsistencies depends on flattening the layout. We can make the current code less complex w/o a flat layout.

          Show
          Eli Collins added a comment - I don't see how cleaning up the variable names and other inconsistencies depends on flattening the layout. We can make the current code less complex w/o a flat layout.
          Hide
          Tom White added a comment -

          We already have almost all of the proposed variables, the trouble is that their naming and usage is very inconsistent. That's what this proposal was aiming at correcting.

          A patch that fixes the inconsistencies for 0.23 and trunk would be great to have, and is separate from introducing any new environment variables. Would you be able to contribute such a patch?

          Also, ideally as a user I'd be able to set one HOME environment variable and have all the others set to reasonable defaults. I'm not sure how true that is in 0.23/trunk today.

          Show
          Tom White added a comment - We already have almost all of the proposed variables, the trouble is that their naming and usage is very inconsistent. That's what this proposal was aiming at correcting. A patch that fixes the inconsistencies for 0.23 and trunk would be great to have, and is separate from introducing any new environment variables. Would you be able to contribute such a patch? Also, ideally as a user I'd be able to set one HOME environment variable and have all the others set to reasonable defaults. I'm not sure how true that is in 0.23/trunk today.
          Hide
          Todd Lipcon added a comment -

          People seem to have missed Bruno's point here:

          Perhaps, I'm expecting too much, but I grew accustomed to my software letting me tweak these types of things however I want without resorting to changing files in what might as well be a R/O file-system (/usr that is).

          This is the main issue of using symlinks for configuration. EG if you want the mapred logs to go into /data/2/mapred/logs, you'd need to modify a symlink at /usr/share/mapred/ or /usr/local/mapred/ or whatever, which isn't generally considered kosher (eg rpm verification will barf). Folks expect to use a configuration file to pick these locations – as we already support in a couple cases by allowing users to override HADOOP_PID_DIR in hadoop-env.sh for example.

          So I think Roman's point is that we allow this "override" in some places but not in others, and the names for the overrides are inconsistent. So I agree with his point that we should make them consistently named across the components.

          On the other hand, the "configurability" only really makes sense for directories like DATA, TMP, PIDS, LOGS, etc, where Hadoop will be writing data. For JARS and NATIVE_LIBS I'm not sure I understand the benefit. Something like EXT_JARS seems like a gray area - we might expect users to be able to configure a list of directories here in order to "wire together" multiple packages or something. Roman/Bruno - can you give a concrete use-case for why you'd want to override JARS,BIN,SBIN, etc?

          Show
          Todd Lipcon added a comment - People seem to have missed Bruno's point here: Perhaps, I'm expecting too much, but I grew accustomed to my software letting me tweak these types of things however I want without resorting to changing files in what might as well be a R/O file-system (/usr that is). This is the main issue of using symlinks for configuration. EG if you want the mapred logs to go into /data/2/mapred/logs , you'd need to modify a symlink at /usr/share/mapred/ or /usr/local/mapred/ or whatever, which isn't generally considered kosher (eg rpm verification will barf). Folks expect to use a configuration file to pick these locations – as we already support in a couple cases by allowing users to override HADOOP_PID_DIR in hadoop-env.sh for example. So I think Roman's point is that we allow this "override" in some places but not in others, and the names for the overrides are inconsistent. So I agree with his point that we should make them consistently named across the components. On the other hand, the "configurability" only really makes sense for directories like DATA, TMP, PIDS, LOGS, etc, where Hadoop will be writing data. For JARS and NATIVE_LIBS I'm not sure I understand the benefit. Something like EXT_JARS seems like a gray area - we might expect users to be able to configure a list of directories here in order to "wire together" multiple packages or something. Roman/Bruno - can you give a concrete use-case for why you'd want to override JARS,BIN,SBIN, etc?
          Hide
          Steve Loughran added a comment -

          I'm going to start by saying I couldn't get the tarball to start up. Here are some of the problems I hit:

          The key problem was the #of various env variables to set, something wring with env propagation (MAPREDUCE-3432 shows this), no "how to get up an running in 5 minutes" documents and the fact that some shell scripts contain assumptions about code layout that aren't valid; HADOOP-7838 show this.

          There's probably an underlying problem: no testing that the tarball works when deployed onto a clean OS into a directory with a space in it somewhere up the tree. This isn't that hard to write; a few ant tasks to <scp> the file then <ssh> some commands -and without it you can't be sure such problems have gone away and won't come back.

          If I have that problem, I expect end users will, and fear for the traffic on hadoop-*-users. That's not just pain and suffering, it will cause people to not use Hadoop. As you don't pay for a free download, you haven't put enough money on the table to spend a day getting the thing up and running on your desktop. Any bad installation experience will put people off.

          Tom white's goal "one single env variable" is what I'd like. Set that, have the others drive off it (unless over-ridden) -and work it out based on bin/something if it isn't predefined.

          Looking at this proposal,

          1. I like the idea of a standard layout that can be tuned, so that we have the option to point to different versions of things if need be, but you don't need to set up everything in advance.
          2. You can't rely on symlinks in windows-land, which, given the recent MS support for Hadoo on Azure, may matter in production as well as dev. And remember, those Windows desktop installs probably form the majority of single-user deployments.
          3. Windows also has the hard limit of 1024 chars on command lines; is the thing that tops out first on long classpaths (forcing you to set the CLASSPATH env variable then call java, but even that has limits).
          4. We need some tests. I know BigTop does this, but would like some pushed up earlier into the process, so all HADOOP- HDFS- and MAPREDUCE- patches get regression tested against the scripts in their initial tests.
          5. Todd's points about config, tmp &c raise another point. per-user options and temp dirs should be in different paths from the binaries. I don't want the temp files on the root disk, and just because Hadoop was installed by root doesn't mean I shouldn't be able to run Hadoop with my own config.
          6. Redirectable config/tmp also makes it trivial to play with different installation options without editing conf files.

          In an ideal world we'd also replace the bash scripts with python as it's a more readable/editable language, less quirky and sets things up for more python-round-the-edges work. I don't know enough about python on windows to know the consequences of that; I'd expect python to be native (not cygwin). I'll put that to one side for now.

          For me, then

          • A root hadoop dir that has things out underneath is good.
          • I would like a way to point to my config/tmp dirs without needing to edit symlinks.
          • This stuff needs to work on windows too.
          • The tarball needs installation tests.
          Show
          Steve Loughran added a comment - I'm going to start by saying I couldn't get the tarball to start up. Here are some of the problems I hit: HADOOP-7838 sbin/start-balancer doesnt MAPREDUCE-3430 - Shell variable expansions in yarn shell scripts should be quoted MAPREDUCE-3431 - NPE in Resource Manager shutdown MAPREDUCE-3432 - Yarn doesn't work if JAVA_HOME isn't set The key problem was the #of various env variables to set, something wring with env propagation ( MAPREDUCE-3432 shows this), no "how to get up an running in 5 minutes" documents and the fact that some shell scripts contain assumptions about code layout that aren't valid; HADOOP-7838 show this. There's probably an underlying problem: no testing that the tarball works when deployed onto a clean OS into a directory with a space in it somewhere up the tree. This isn't that hard to write; a few ant tasks to <scp> the file then <ssh> some commands -and without it you can't be sure such problems have gone away and won't come back. If I have that problem, I expect end users will, and fear for the traffic on hadoop-*-users. That's not just pain and suffering, it will cause people to not use Hadoop. As you don't pay for a free download, you haven't put enough money on the table to spend a day getting the thing up and running on your desktop. Any bad installation experience will put people off. Tom white's goal "one single env variable" is what I'd like. Set that, have the others drive off it (unless over-ridden) -and work it out based on bin/something if it isn't predefined. Looking at this proposal, I like the idea of a standard layout that can be tuned, so that we have the option to point to different versions of things if need be, but you don't need to set up everything in advance. You can't rely on symlinks in windows-land, which, given the recent MS support for Hadoo on Azure, may matter in production as well as dev. And remember, those Windows desktop installs probably form the majority of single-user deployments. Windows also has the hard limit of 1024 chars on command lines; is the thing that tops out first on long classpaths (forcing you to set the CLASSPATH env variable then call java, but even that has limits). We need some tests. I know BigTop does this, but would like some pushed up earlier into the process, so all HADOOP- HDFS- and MAPREDUCE- patches get regression tested against the scripts in their initial tests. Todd's points about config, tmp &c raise another point. per-user options and temp dirs should be in different paths from the binaries. I don't want the temp files on the root disk, and just because Hadoop was installed by root doesn't mean I shouldn't be able to run Hadoop with my own config. Redirectable config/tmp also makes it trivial to play with different installation options without editing conf files. In an ideal world we'd also replace the bash scripts with python as it's a more readable/editable language, less quirky and sets things up for more python-round-the-edges work. I don't know enough about python on windows to know the consequences of that; I'd expect python to be native (not cygwin). I'll put that to one side for now. For me, then A root hadoop dir that has things out underneath is good. I would like a way to point to my config/tmp dirs without needing to edit symlinks. This stuff needs to work on windows too. The tarball needs installation tests.
          Hide
          Alejandro Abdelnur added a comment -

          Arriving a bit later to the party,

          It seems we have a consensus that there is a need on simplicity for end-users and greater flexibility for packagers.

          Echoing Tom, and going a bit further, things should work even without setting any ENV variable for the default TAR distribution.

          Still, we need to enable packagers (for the different OSes) to easily tweak where things go. And given that different OSes have different standards it is not possible to make assumptions on where things start from the ROOT level or from a basedirectory (Linux(es), Solaris, OSX, Windows).

          The use of symlinks, as Bruno, Todd and Steve pointed out, does not seem a viable solution.

          Regarding Allen's comment on using several ENV variables does not have an impact on the overflowing the command buffer, what has an impact is not dedupping things like the classpath. Because of this, regardless of several ENV variables or not, a better handling of the classpath has to be done. For this, we could leverage Java6 '*.jar' handling in classpath. Also, HADOOP-7934 would help as it ensures all dependencies versions are exactly the same across Hadoop; this would allow to effectively dedup the JARs that end up in the classpath.

          For the default TAR distribution, the proposal does not require and end-user to setup 12 ENV vars per component, nor to setup one HOME variable per component (common, hdfs, mapred, yarn) but a single HADOOP_HOME. The component HOME variable and the component's 11 sub-variables are resolved to default values from HADOOP_HOME. Even HADOOP_HOME could be resolved by default (if not present) based on directory the script is been invoked.

          On Todd's question about if it is possible to get rid of the *_BIN, *_SBIN, *_LIBEXEC, *_JARS, *_EXT_JARS, *_NATIVE_LIBS variables. I don't think so as all this bits may end up in different locations depending on the packager/OS (/opt, /usr/lib, /usr/share, /usr/share/local).

          The key thing is that most, if not all, of these ENV variables will be transparent to end-users, only packagers care about it.

          Show
          Alejandro Abdelnur added a comment - Arriving a bit later to the party, It seems we have a consensus that there is a need on simplicity for end-users and greater flexibility for packagers. Echoing Tom, and going a bit further, things should work even without setting any ENV variable for the default TAR distribution. Still, we need to enable packagers (for the different OSes) to easily tweak where things go. And given that different OSes have different standards it is not possible to make assumptions on where things start from the ROOT level or from a basedirectory (Linux(es), Solaris, OSX, Windows). The use of symlinks, as Bruno, Todd and Steve pointed out, does not seem a viable solution. Regarding Allen's comment on using several ENV variables does not have an impact on the overflowing the command buffer, what has an impact is not dedupping things like the classpath. Because of this, regardless of several ENV variables or not, a better handling of the classpath has to be done. For this, we could leverage Java6 '*.jar' handling in classpath. Also, HADOOP-7934 would help as it ensures all dependencies versions are exactly the same across Hadoop; this would allow to effectively dedup the JARs that end up in the classpath. For the default TAR distribution, the proposal does not require and end-user to setup 12 ENV vars per component, nor to setup one HOME variable per component (common, hdfs, mapred, yarn) but a single HADOOP_HOME. The component HOME variable and the component's 11 sub-variables are resolved to default values from HADOOP_HOME. Even HADOOP_HOME could be resolved by default (if not present) based on directory the script is been invoked. On Todd's question about if it is possible to get rid of the *_BIN, *_SBIN, *_LIBEXEC, *_JARS, *_EXT_JARS, *_NATIVE_LIBS variables. I don't think so as all this bits may end up in different locations depending on the packager/OS (/opt, /usr/lib, /usr/share, /usr/share/local). The key thing is that most, if not all, of these ENV variables will be transparent to end-users, only packagers care about it.
          Hide
          eric baldeschwieler added a comment -

          How can environment variables be the right way to push installation information around?

          Show
          eric baldeschwieler added a comment - How can environment variables be the right way to push installation information around?
          Hide
          Alejandro Abdelnur added a comment -

          @Eric, that is exactly how hadoop-env.sh works today where you can set all the following ENV vars:

          JAVA_HOME
          HADOOP_CLASSPATH
          HADOOP_HEAPSIZE
          HADOOP_OPTS
          HADOOP_NAMENODE_OPTS
          HADOOP_SECONDARYNAMENODE_OPTS
          HADOOP_DATANODE_OPTS
          HADOOP_BALANCER_OPTS
          HADOOP_JOBTRACKER_OPTS
          HADOOP_TASKTRACKER_OPTS
          HADOOP_CLIENT_OPTS
          HADOOP_SSH_OPTS
          HADOOP_LOG_DIR
          HADOOP_SLAVES
          HADOOP_MASTER
          HADOOP_SLAVE_SLEEP
          HADOOP_PID_DIR
          HADOOP_IDENT_STRING
          HADOOP_NICENESS
          
          Show
          Alejandro Abdelnur added a comment - @Eric, that is exactly how hadoop-env.sh works today where you can set all the following ENV vars: JAVA_HOME HADOOP_CLASSPATH HADOOP_HEAPSIZE HADOOP_OPTS HADOOP_NAMENODE_OPTS HADOOP_SECONDARYNAMENODE_OPTS HADOOP_DATANODE_OPTS HADOOP_BALANCER_OPTS HADOOP_JOBTRACKER_OPTS HADOOP_TASKTRACKER_OPTS HADOOP_CLIENT_OPTS HADOOP_SSH_OPTS HADOOP_LOG_DIR HADOOP_SLAVES HADOOP_MASTER HADOOP_SLAVE_SLEEP HADOOP_PID_DIR HADOOP_IDENT_STRING HADOOP_NICENESS
          Hide
          eric baldeschwieler added a comment -

          I understand. I'm just observing that this is not exactly best practice and institutionalizing more of it doesn't feel like we are moving the product forward to me. Environment variables should really be to allow users to personalize their environment, not for admins to stitch the base config together.

          Show
          eric baldeschwieler added a comment - I understand. I'm just observing that this is not exactly best practice and institutionalizing more of it doesn't feel like we are moving the product forward to me. Environment variables should really be to allow users to personalize their environment, not for admins to stitch the base config together.
          Hide
          Alejandro Abdelnur added a comment -

          @Eric, this is not for admins but for packagers. All these ENV vars would be private to the scripts (they are sourced within), admins don't have to deal with them.

          Show
          Alejandro Abdelnur added a comment - @Eric, this is not for admins but for packagers. All these ENV vars would be private to the scripts (they are sourced within), admins don't have to deal with them.
          Hide
          Alejandro Abdelnur added a comment -

          (I'm referring to the ones that this JIRA is proposing)

          Show
          Alejandro Abdelnur added a comment - (I'm referring to the ones that this JIRA is proposing)
          Hide
          Roman Shaposhnik added a comment -

          First of all, I would like thank everybody for their feedback. And especially Alejandro for summing up the approach I'm taking.

          I'm attaching a patch for common that, I hope, takes care of some of the concerns raised on this JIRA. I'm also going to open up corresponding JIRAs for HDFS and MAPREDUCE so that I can provide patches for review over there.

          Please note, that this patch has been somewhat tested on Linux, but hasn't seen any exposure to Cygwin nor MacOS/X nor Solaris. I'm going to do that over the weekend.

          Once again, thanks to everybody who provided practical feedback and thanks for your time (in advance! ) for reviewing the patch.

          Show
          Roman Shaposhnik added a comment - First of all, I would like thank everybody for their feedback. And especially Alejandro for summing up the approach I'm taking. I'm attaching a patch for common that, I hope, takes care of some of the concerns raised on this JIRA. I'm also going to open up corresponding JIRAs for HDFS and MAPREDUCE so that I can provide patches for review over there. Please note, that this patch has been somewhat tested on Linux, but hasn't seen any exposure to Cygwin nor MacOS/X nor Solaris. I'm going to do that over the weekend. Once again, thanks to everybody who provided practical feedback and thanks for your time (in advance! ) for reviewing the patch.
          Roman Shaposhnik made changes -
          Field Original Value New Value
          Attachment HADOOP-7939.patch.txt [ 12509704 ]
          Roman Shaposhnik made changes -
          Link This issue requires MAPREDUCE-3635 [ MAPREDUCE-3635 ]
          Roman Shaposhnik made changes -
          Link This issue requires HDFS-2761 [ HDFS-2761 ]
          Hide
          Roman Shaposhnik added a comment -

          Attaching a newly introduced file as-is for ease of review

          Show
          Roman Shaposhnik added a comment - Attaching a newly introduced file as-is for ease of review
          Roman Shaposhnik made changes -
          Attachment hadoop-layout.sh [ 12509706 ]
          Hide
          Alejandro Abdelnur added a comment -

          Roman,

          It looks like the patch is trimming a lot of fat, nice!.

          I have not tested but a couple of comments from browsing it:

          • why hadoop-common scripts try to resolve hdfs/yarn/mapreduce classpath, can we get rid of that?
          • the heap size configuration, your patch is removing the assumption of the value being megabytes. While it makes sense this may break things in many configurations being reused. Could this be a bit smarter and if there is no unit assume megabytes?

          I'll wait for your weekend testing/improvements to give it a spin.

          Thxs.

          Show
          Alejandro Abdelnur added a comment - Roman, It looks like the patch is trimming a lot of fat, nice!. I have not tested but a couple of comments from browsing it: why hadoop-common scripts try to resolve hdfs/yarn/mapreduce classpath, can we get rid of that? the heap size configuration, your patch is removing the assumption of the value being megabytes. While it makes sense this may break things in many configurations being reused. Could this be a bit smarter and if there is no unit assume megabytes? I'll wait for your weekend testing/improvements to give it a spin. Thxs.
          Hide
          Eric Yang added a comment -
          1. The proposed layout is ignoring the repeated history that the community had similar history to split into HADOOP_[COMMON|HDFS|MAPRED]_HOME which causes Hadoop to become painful to manage and cause developers a lot of grief. (HADOOP-4868)
          2. The course was reversed to merge Hadoop back into a single package to improve integration. (HADOOP-7642)
          3. The proposed change also ignore integration with the base OS. (HADOOP-6255)
          4. Native library should not be placed in $(HADOOP_[SC]HOME)/share/hadoop/$([SC])/native because it is inconsistent with recommended unix file structure layout. The recommend value for native library should be $HADOOP[SC]_HOME/lib.
          5. HADOOP_PREFIX (or HADOOP_HOME) is programmatically resolved by script, user does not need to have HADOOP_HOME variable in today's scripts. This should be the default behavior.
          6. I recommend to use current structure because proposed HADOOP-6255 structure has been implemented in most of Hadoop related projects. Related projects are adopting to this change. It may be counter productive to reinvent the wheel and send shock waves across Hadoop related projects at this time.
          Show
          Eric Yang added a comment - The proposed layout is ignoring the repeated history that the community had similar history to split into HADOOP_ [COMMON|HDFS|MAPRED] _HOME which causes Hadoop to become painful to manage and cause developers a lot of grief. ( HADOOP-4868 ) The course was reversed to merge Hadoop back into a single package to improve integration. ( HADOOP-7642 ) The proposed change also ignore integration with the base OS. ( HADOOP-6255 ) Native library should not be placed in $(HADOOP_ [SC] HOME)/share/hadoop/$( [SC] )/native because it is inconsistent with recommended unix file structure layout. The recommend value for native library should be $HADOOP [SC] _HOME/lib. HADOOP_PREFIX (or HADOOP_HOME) is programmatically resolved by script, user does not need to have HADOOP_HOME variable in today's scripts. This should be the default behavior. I recommend to use current structure because proposed HADOOP-6255 structure has been implemented in most of Hadoop related projects. Related projects are adopting to this change. It may be counter productive to reinvent the wheel and send shock waves across Hadoop related projects at this time.
          Hide
          Peter Linnell added a comment -

          This first patch looks good and putting on my packager's hat on it a good first step to add some sanity.

          While I am new to Apache and Hadoop, FWIW, I've been building rpms for years and I have maintain(ed) a number of packages and repos for openSUSE and Fedora.

          My temptation is to give this a strong +1, with a couple of comments:

          Can you extend the comment in line 513 to add. 'Once disabled, you might need to reboot your machine' This is surely the case with SLES/openSUSE.

          I do not understand why libexec is getting moved or even used. I realize it's not your fault, but I am not a fan of doing that 1. It is not consistent across distros. 2. There is a move afoot to move all binaries to /usr see: https://fedoraproject.org/wiki/Features/UsrMove openSUSE 12.2 is currently planned to do this as well.

          As for https://issues.apache.org/jira/browse/HADOOP-6255, well its sub-optimal. That surely would never pass the smell test to get into a major linux distro as is. I might get ambitious and provide a patch - if it would be accepted. I do not see "integration" in the base OS, as so many standard macros and locations are overridden by un-needed defines.

          I do not see HADOOP-4868 as relevant and in fact think this is a more elegant solution sourcing environment variables in one place.

          Roman thanks for this.

          Show
          Peter Linnell added a comment - This first patch looks good and putting on my packager's hat on it a good first step to add some sanity. While I am new to Apache and Hadoop, FWIW, I've been building rpms for years and I have maintain(ed) a number of packages and repos for openSUSE and Fedora. My temptation is to give this a strong +1, with a couple of comments: Can you extend the comment in line 513 to add. 'Once disabled, you might need to reboot your machine' This is surely the case with SLES/openSUSE. I do not understand why libexec is getting moved or even used. I realize it's not your fault, but I am not a fan of doing that 1. It is not consistent across distros. 2. There is a move afoot to move all binaries to /usr see: https://fedoraproject.org/wiki/Features/UsrMove openSUSE 12.2 is currently planned to do this as well. As for https://issues.apache.org/jira/browse/HADOOP-6255 , well its sub-optimal. That surely would never pass the smell test to get into a major linux distro as is. I might get ambitious and provide a patch - if it would be accepted. I do not see "integration" in the base OS, as so many standard macros and locations are overridden by un-needed defines. I do not see HADOOP-4868 as relevant and in fact think this is a more elegant solution sourcing environment variables in one place. Roman thanks for this.
          Hide
          Eric Yang added a comment -

          Fedora's move of /[bin|lib] to /usr/[bin|lib] has no conflict with proposal from HADOOP-6255. In fact, the current layout works well with HADOOP_PREFIX=/usr. The current structure is generalized across LSB (with exception of $HADOOP_PREFIX/libexec does not exist on OpenSUSE). The current layout is generalized to works on Solaris, Darwin, Linux, and Cygwin. The proposed change of $HADOOP_HOME/lib/native would be a major incompatibility naming convention among most linux distros.

          The default config directory was renamed from $HADOOP_HOME/conf to $HADOOP_PREFIX/etc/hadoop to avoid config directory naming conflict. In this patch, it is attempting to revert it back to $HADOOP_HOME/conf. There is a potential naming collusion with other projects and should be avoided.

          -bin=`dirname "${BASH_SOURCE-$0}"`
          -bin=`cd "$bin"; pwd`
          +# resolve links - $0 may be a softlink
          +PRG="${0}"
          +while [ -h "${PRG}" ]; do
          +  ls=`ls -ld "${PRG}"`
          +  link=`expr "$ls" : '.*-> \(.*\)$'`
          +  if expr "$link" : '/.*' > /dev/null; then
          +    PRG="$link"
          +  else
          +    PRG=`dirname "${PRG}"`/"$link"
          +  fi
          +done
          +SELF_BASEDIR="`dirname ${PRG}`"
          +SELF_BASEDIR="`cd ${SELF_BASEDIR}/..;pwd`"
          

          What is the reason to revert symlink resolution code back?

          Shouldn't $HADOOP_COMMON_HOME/lib/native be renamed to $HADOOP_COMMON_HOME/lib to be consistent with naming conventions of *nix C library directory structure?

          Show
          Eric Yang added a comment - Fedora's move of / [bin|lib] to /usr/ [bin|lib] has no conflict with proposal from HADOOP-6255 . In fact, the current layout works well with HADOOP_PREFIX=/usr. The current structure is generalized across LSB (with exception of $HADOOP_PREFIX/libexec does not exist on OpenSUSE). The current layout is generalized to works on Solaris, Darwin, Linux, and Cygwin. The proposed change of $HADOOP_HOME/lib/native would be a major incompatibility naming convention among most linux distros. The default config directory was renamed from $HADOOP_HOME/conf to $HADOOP_PREFIX/etc/hadoop to avoid config directory naming conflict. In this patch, it is attempting to revert it back to $HADOOP_HOME/conf. There is a potential naming collusion with other projects and should be avoided. -bin=`dirname "${BASH_SOURCE-$0}"` -bin=`cd "$bin"; pwd` +# resolve links - $0 may be a softlink +PRG="${0}" +while [ -h "${PRG}" ]; do + ls=`ls -ld "${PRG}"` + link=`expr "$ls" : '.*-> \(.*\)$'` + if expr "$link" : '/.*' > /dev/null; then + PRG="$link" + else + PRG=`dirname "${PRG}"`/"$link" + fi +done +SELF_BASEDIR="`dirname ${PRG}`" +SELF_BASEDIR="`cd ${SELF_BASEDIR}/..;pwd`" What is the reason to revert symlink resolution code back? Shouldn't $HADOOP_COMMON_HOME/lib/native be renamed to $HADOOP_COMMON_HOME/lib to be consistent with naming conventions of *nix C library directory structure?
          Hide
          Bruno Mahé added a comment -

          Eric>

          • $HADOOP_PREFIX/libexec does not exist on debian/ubuntu either.
          • I am also not sure to follow about $HADOOP_PREFIX/etc/hadoop. /usr/etc/* would be wrong in any case since /usr tend to be read only.
          • The current layout is fine for tarballs. But it is not suited for an integrated deployment with the system. But I would rather not distract the current discussion/ticket.

          Roman> Any progress on your testing/improvment?

          Show
          Bruno Mahé added a comment - Eric> $HADOOP_PREFIX/libexec does not exist on debian/ubuntu either. I am also not sure to follow about $HADOOP_PREFIX/etc/hadoop. /usr/etc/* would be wrong in any case since /usr tend to be read only. The current layout is fine for tarballs. But it is not suited for an integrated deployment with the system. But I would rather not distract the current discussion/ticket. Roman> Any progress on your testing/improvment?
          Roman Shaposhnik made changes -
          Link This issue is related to BIGTOP-316 [ BIGTOP-316 ]
          Hide
          Eric Yang added a comment -

          Bruno, here are documents regarding /usr/etc:

          http://lists.debian.org/debian-devel/2000/12/msg01888.html
          http://www.ru.j-npcs.org/usoft/WWW/www_pathname.com/fhs/1.2/fsstnd-4_5.html

          It is optional but it is not wrong in any case. Hadoop packages can use /usr/etc/hadoop for configuration, or override with HADOOP_CONF_DIR=/etc/hadoop. The current system incorporates all best ideas, and I hope people champion for modification for FHS, should think outside of FHS before crippling Hadoop.

          /usr/libexec is a idiom from bsd world. If Cloudera feel strongly that this should be named as /usr/lib/hadoop/libexec in SUSE or Debian world. I don't have any objection. If a good idea is against FHS, then FHS should be changed rather than FHS is the only standard. Fedora file structure layout proposal is one of such change. I would not lose any sleep over complying with FHS.

          Show
          Eric Yang added a comment - Bruno, here are documents regarding /usr/etc: http://lists.debian.org/debian-devel/2000/12/msg01888.html http://www.ru.j-npcs.org/usoft/WWW/www_pathname.com/fhs/1.2/fsstnd-4_5.html It is optional but it is not wrong in any case. Hadoop packages can use /usr/etc/hadoop for configuration, or override with HADOOP_CONF_DIR=/etc/hadoop. The current system incorporates all best ideas, and I hope people champion for modification for FHS, should think outside of FHS before crippling Hadoop. /usr/libexec is a idiom from bsd world. If Cloudera feel strongly that this should be named as /usr/lib/hadoop/libexec in SUSE or Debian world. I don't have any objection. If a good idea is against FHS, then FHS should be changed rather than FHS is the only standard. Fedora file structure layout proposal is one of such change. I would not lose any sleep over complying with FHS.
          Hide
          Bruno Mahé added a comment -

          Eric,
          Thanks for the feedback!
          But first of all, this is not a Cloudera issue. I am here as an Apache contributor working on getting Apache Hadoop and Apache Bigtop (incubating) better. I don't do this because of my employer, I do this because I believe this ticket will improve Apache Hadoop and make Apache Hadoop deployment easier, more maintainable and robust.

          Regarding /usr/etc, I still don't see why it's not a bad location. This idea does not seem very popular in your first link, and your second link recommends against using it. Later versions of the FHS even say "Note that /usr/etc is still not allowed: programs in /usr should place configuration files in /etc." around http://www.debian.org/doc/packaging-manuals/fhs/fhs-2.3.html#USRLOCALLOCALHIERARCHY and the section you refer to seems to have disappeared since then.
          But again, this patch does not break the current layout. So if variables are not overridden, the layout will stay the same. No one will have to touch that variable to get Hadoop working So I am not sure to see the issue.

          Regarding libexec, I would rather use it when available and conform to the target GNU/Linux distribution when not available. But again this patch will not change the default layout of Hadoop (as far as I know) and no one will have to touch that variable to get Hadoop working. So I still don't see the issue.

          I don't want to distract the attention from Roman's patch, so if you wish to talk more about the layout, feel free to send me an email or open a ticket detailing the changes you would like to do Apache Hadoop layout.

          Show
          Bruno Mahé added a comment - Eric, Thanks for the feedback! But first of all, this is not a Cloudera issue. I am here as an Apache contributor working on getting Apache Hadoop and Apache Bigtop (incubating) better. I don't do this because of my employer, I do this because I believe this ticket will improve Apache Hadoop and make Apache Hadoop deployment easier, more maintainable and robust. Regarding /usr/etc, I still don't see why it's not a bad location. This idea does not seem very popular in your first link, and your second link recommends against using it. Later versions of the FHS even say "Note that /usr/etc is still not allowed: programs in /usr should place configuration files in /etc." around http://www.debian.org/doc/packaging-manuals/fhs/fhs-2.3.html#USRLOCALLOCALHIERARCHY and the section you refer to seems to have disappeared since then. But again, this patch does not break the current layout. So if variables are not overridden, the layout will stay the same. No one will have to touch that variable to get Hadoop working So I am not sure to see the issue. Regarding libexec, I would rather use it when available and conform to the target GNU/Linux distribution when not available. But again this patch will not change the default layout of Hadoop (as far as I know) and no one will have to touch that variable to get Hadoop working. So I still don't see the issue. I don't want to distract the attention from Roman's patch, so if you wish to talk more about the layout, feel free to send me an email or open a ticket detailing the changes you would like to do Apache Hadoop layout.
          Hide
          Tom White added a comment -

          > What is the reason to revert symlink resolution code back?

          Agreed - the work in HADOOP-7089 came up with a solution for symlink resolution which we should continue to use.

          Show
          Tom White added a comment - > What is the reason to revert symlink resolution code back? Agreed - the work in HADOOP-7089 came up with a solution for symlink resolution which we should continue to use.
          Hide
          Roman Shaposhnik added a comment -

          @Tom, what gives you an impression that HADOOP-7089 will be reverted? I'm reverting it, merely generalizing not to depend on functionality that is not alway present on all systems (cd -P and pwd -P).

          The code that I'm putting in place of it is already present in HADOOP tree:
          http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/libexec/httpfs-config.sh?view=markup

          That code has been in Tomcat's scripts for quite sometime and I believe this is the most portable symlink resolution code I've seen to date.

          Do you disagree?

          Show
          Roman Shaposhnik added a comment - @Tom, what gives you an impression that HADOOP-7089 will be reverted? I'm reverting it, merely generalizing not to depend on functionality that is not alway present on all systems (cd -P and pwd -P). The code that I'm putting in place of it is already present in HADOOP tree: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/libexec/httpfs-config.sh?view=markup That code has been in Tomcat's scripts for quite sometime and I believe this is the most portable symlink resolution code I've seen to date. Do you disagree?
          Hide
          Roman Shaposhnik added a comment -

          @Eric Y,

          let me try to address some of your concerns:

          1. I think it is pretty clear by now that the HADOOP-6255 is not 100% compatible with all of the OSes that Hadoop has to support. Hence I think it would be safe to assume that we agree that the layout needs to be flexible. The question then, becomes what's the best implementation to make it flexible AND keep the HADOOP-6255 as the default layout for tarball and OSes where it makes sense.
          1. you mentioned that you don't want libraries under /native – fine, I can change the default layout to put it into any other place. Do you want them to be mixed with JARs under HADOOP_<SC>_HOME/lib ?
          1. the default behavior for the end users is preserved – they do NOT need to set any variables to run hadoop (not HADOOP_HOME nor HADOOP_PREFIX)

          Eric, please let me know if you have any further concerns, otherwise, I'll modify my patches accordingly.

          Show
          Roman Shaposhnik added a comment - @Eric Y, let me try to address some of your concerns: I think it is pretty clear by now that the HADOOP-6255 is not 100% compatible with all of the OSes that Hadoop has to support. Hence I think it would be safe to assume that we agree that the layout needs to be flexible. The question then, becomes what's the best implementation to make it flexible AND keep the HADOOP-6255 as the default layout for tarball and OSes where it makes sense. you mentioned that you don't want libraries under /native – fine, I can change the default layout to put it into any other place. Do you want them to be mixed with JARs under HADOOP_<SC>_HOME/lib ? the default behavior for the end users is preserved – they do NOT need to set any variables to run hadoop (not HADOOP_HOME nor HADOOP_PREFIX) Eric, please let me know if you have any further concerns, otherwise, I'll modify my patches accordingly.
          Hide
          Eric Yang added a comment -

          you mentioned that you don't want libraries under /native – fine, I can change the default layout to put it into any other place. Do you want them to be mixed with JARs under HADOOP_<SC>_HOME/lib ?

          Only native library should go into HADOOP_PREFIX/lib. This will enable runtime linker to discover native library efficiently. Jar files should be hosted in HADOOP_PREFIX/share/hadoop/<SC> or HADOOP_PREFIX/share/hadoop/<SC>/lib because they are platform independent files.

          The current system is partitioned into:

          HADOOP_PREFIX/share/hadoop/common
          HADOOP_PREFIX/share/hadoop/common/lib (third party jars)
          HADOOP_PREFIX/share/hadoop/hdfs
          HADOOP_PREFIX/share/hadoop/hdfs/lib (third party jars)
          HADOOP_PREFIX/share/hadoop/mapreduce
          HADOOP_PREFIX/share/hadoop/mapreduce/lib (third party jars)
          

          In addition, it also supports separated layout like this:

          HADOOP_HDFS_HOME/share/hadoop/hdfs
          HADOOP_HDFS_HOME/share/hadoop/hdfs/lib
          YARN_HOME/share/hadoop/mapreduce
          YARN_HOME/share/hadoop/mapreduce/lib
          

          These features ensure that end user can deploy subcomponents as a merged layout or independent layout. If you can preserve all of the above, then it will satisfy end user requests. It should be fine to rename YARN_HOME to HADOOP_YARN_HOME for consistency.

          I am +0 for renaming libexec to HADOOP_PREFIX/lib/hadoop/libexec, but it is less efficient from runtime linker perspective.

          Show
          Eric Yang added a comment - you mentioned that you don't want libraries under /native – fine, I can change the default layout to put it into any other place. Do you want them to be mixed with JARs under HADOOP_<SC>_HOME/lib ? Only native library should go into HADOOP_PREFIX/lib. This will enable runtime linker to discover native library efficiently. Jar files should be hosted in HADOOP_PREFIX/share/hadoop/<SC> or HADOOP_PREFIX/share/hadoop/<SC>/lib because they are platform independent files. The current system is partitioned into: HADOOP_PREFIX/share/hadoop/common HADOOP_PREFIX/share/hadoop/common/lib (third party jars) HADOOP_PREFIX/share/hadoop/hdfs HADOOP_PREFIX/share/hadoop/hdfs/lib (third party jars) HADOOP_PREFIX/share/hadoop/mapreduce HADOOP_PREFIX/share/hadoop/mapreduce/lib (third party jars) In addition, it also supports separated layout like this: HADOOP_HDFS_HOME/share/hadoop/hdfs HADOOP_HDFS_HOME/share/hadoop/hdfs/lib YARN_HOME/share/hadoop/mapreduce YARN_HOME/share/hadoop/mapreduce/lib These features ensure that end user can deploy subcomponents as a merged layout or independent layout. If you can preserve all of the above, then it will satisfy end user requests. It should be fine to rename YARN_HOME to HADOOP_YARN_HOME for consistency. I am +0 for renaming libexec to HADOOP_PREFIX/lib/hadoop/libexec, but it is less efficient from runtime linker perspective.
          Hide
          Bruno Mahé added a comment -

          My main interest on this ticket is to enable different deployment layouts. So if these changes to the default layout would help moving this ticket forward, I would encourage Roman to follow them.

          But I would recommend to focus the default layout on what would be a tarball deployment instead of the current one size fit all approach. For instance, I understand the purpose of HADOOP_PREFIX/lib is to put all native libraries in a system level directory where HADOOP_PREFIX is assumed to be /usr, but this would only work for a subset of the users and deployment. The simplest counter-example would be on 64bit GNU/Linux distributions which, for most of them, would deploy native libraries in /usr/lib64 and use /usr/lib for 32bit native libraries. The new multi-arch feature in debian could also make matters more complex. The same would apply to libexec and /usr/etc.
          Focusing the default layout on what would be a tarball deployment could bring several benefits such as:

          • Simpler layout since there is no specific system layout to follow and all components can be completely separated
          • Not having a dual use for a "lib" directory
          • More consistency since the layout would stop trying to be 5 different things at the same time
          • More coherent layout since packages would be able to really use /etc/hadoop for their config for instance
          • Tarballs have a different use case than packages.
            The downside is there may be the cost involved with packages having a different layout than the tarball. But from my experience with Apache Bigtop where the layout may differ depending on the target GNU/Linux distribution, I am confident such costs would be negligible comparing to the gains of this approach.
            But again this is only my personal opinion on the current layout and am really more interested in getting this improvement into Apache Hadoop.
          Show
          Bruno Mahé added a comment - My main interest on this ticket is to enable different deployment layouts. So if these changes to the default layout would help moving this ticket forward, I would encourage Roman to follow them. But I would recommend to focus the default layout on what would be a tarball deployment instead of the current one size fit all approach. For instance, I understand the purpose of HADOOP_PREFIX/lib is to put all native libraries in a system level directory where HADOOP_PREFIX is assumed to be /usr, but this would only work for a subset of the users and deployment. The simplest counter-example would be on 64bit GNU/Linux distributions which, for most of them, would deploy native libraries in /usr/lib64 and use /usr/lib for 32bit native libraries. The new multi-arch feature in debian could also make matters more complex. The same would apply to libexec and /usr/etc. Focusing the default layout on what would be a tarball deployment could bring several benefits such as: Simpler layout since there is no specific system layout to follow and all components can be completely separated Not having a dual use for a "lib" directory More consistency since the layout would stop trying to be 5 different things at the same time More coherent layout since packages would be able to really use /etc/hadoop for their config for instance Tarballs have a different use case than packages. The downside is there may be the cost involved with packages having a different layout than the tarball. But from my experience with Apache Bigtop where the layout may differ depending on the target GNU/Linux distribution, I am confident such costs would be negligible comparing to the gains of this approach. But again this is only my personal opinion on the current layout and am really more interested in getting this improvement into Apache Hadoop.
          Hide
          Eric Yang added a comment -

          Not having a dual use for a "lib" directory

          Debian has make use of /usr/lib for 64bit binary and /usr/lib32 for 32bit binary for amd64.
          The usage of lib[64|32] should be at discretion of the packager for the targeted platform. Surely, packaging for Hadoop isn't perfect, but Hadoop 1.0.0 packages are much cleaner layout than Bigtop-0.2.0 hadoop pacakges in my opinion.

          More coherent layout since packages would be able to really use /etc/hadoop for their config for instance

          For Hadoop 1.0.0 packages, the config files are in /etc/hadoop. "rpm -ql hadoop" should show how the config files are placed.

          In addition, bigtop released artifacts should be prefix with bigtop- to avoid confusion with hadoop released packages. This has not been fixed by bigtop community before 0.1 release and I plead bigtop community to fix them. Since bigtop project decided to work on Hadoop packaging externally, rather than contribute to Hadoop community to begin with. Why is the change of heart to make packaging modification in Hadoop to resolve Bigtop's own packaging issues? There are fair amount of investment done in various projects to make packaging consistent for Hadoop stack. Any change applied here should submit to ZooKeeper, Pig, HCatalog, HBase, Chukwa, etc.

          Show
          Eric Yang added a comment - Not having a dual use for a "lib" directory Debian has make use of /usr/lib for 64bit binary and /usr/lib32 for 32bit binary for amd64. The usage of lib [64|32] should be at discretion of the packager for the targeted platform. Surely, packaging for Hadoop isn't perfect, but Hadoop 1.0.0 packages are much cleaner layout than Bigtop-0.2.0 hadoop pacakges in my opinion. More coherent layout since packages would be able to really use /etc/hadoop for their config for instance For Hadoop 1.0.0 packages, the config files are in /etc/hadoop. "rpm -ql hadoop" should show how the config files are placed. In addition, bigtop released artifacts should be prefix with bigtop- to avoid confusion with hadoop released packages. This has not been fixed by bigtop community before 0.1 release and I plead bigtop community to fix them. Since bigtop project decided to work on Hadoop packaging externally, rather than contribute to Hadoop community to begin with. Why is the change of heart to make packaging modification in Hadoop to resolve Bigtop's own packaging issues? There are fair amount of investment done in various projects to make packaging consistent for Hadoop stack. Any change applied here should submit to ZooKeeper, Pig, HCatalog, HBase, Chukwa, etc.
          Hide
          Roman Shaposhnik added a comment -

          @Eric,

          it looks like you seem to agree with the proposal now barring the location of the native libraries (which I'm going to modify accordingly). Does it mean you have no further concerns and we can move on with the implementation and eventual commit of the updated patch?

          Show
          Roman Shaposhnik added a comment - @Eric, it looks like you seem to agree with the proposal now barring the location of the native libraries (which I'm going to modify accordingly). Does it mean you have no further concerns and we can move on with the implementation and eventual commit of the updated patch?
          Hide
          Bruno Mahé added a comment -

          He made my year!

          Show
          Bruno Mahé added a comment - He made my year!
          Hide
          Bruno Mahé added a comment -

          At this point, I would just update the patch so we can move on.
          Otherwise he may never reply or make you wait.

          Show
          Bruno Mahé added a comment - At this point, I would just update the patch so we can move on. Otherwise he may never reply or make you wait.
          Hide
          Bruno Mahé added a comment -

          @eric, sorry about the last messages. That was rude of me. I applaud all the efforts put in the current layout and my message regarding the tarball layout was not meant to be taken as a critic of the layout itself, but as ways to improve it.

          Roman> Can I help preparing the patch in any way?

          Show
          Bruno Mahé added a comment - @eric, sorry about the last messages. That was rude of me. I applaud all the efforts put in the current layout and my message regarding the tarball layout was not meant to be taken as a critic of the layout itself, but as ways to improve it. Roman> Can I help preparing the patch in any way?
          Hide
          Arun C Murthy added a comment -

          Eric & Bruno - I urge you both to exercise restraint as I see this becoming a religious war of opinions (I do appreciate Bruno's apology).


          I've been hoping we could come to a simple consensus quickly, but I'm starting to worry this is going to be an ongoing back-and-forth between Hadoop packaging and Bigtop and that both parties always leave with a bitter aftertaste. Now we've added YARN and I expect we'll have further issues in the future. For e.g. there is some chance we'll have an MPI sub-component for YARN... etc. etc.

          Worse, this might not be just limited to Apache Hadoop and will repeat in lots of projects in the Hadoop ecosystem.

          Frankly, I don't care a whole lot and I trust folks like Roman, Eric & Bruno are more qualified on packaging matters and I'd hate to have to mediate on an ongoing basis since I obviously care about having decent packaging for all the work I do here (and I'm sure other developers feel the same). It would mean I'd have to spend a lot more time caring about FHS etc. in order to argue with two sets of experts, something I can do without.

          However, I'm coming to the conclusion, and I say this with trepidation that I may be interpreted wrongly, that we'll never get over these while Apache Bigtop keeps packaging of Hadoop projects external to the projects themselves.

          (I believe this has been brought up earlier and I confess ignorance to the reasons which led to marginalizing this issue. I'm happy to be educated.)

          Thus, I'd strongly urge that the Apache Bigtop community to consider resolving these issues once & for all and to contribute these back to the projects for a couple of significant benefits:
          a) The individual projects benefit from expertise & oversight of folks like Roman & Bruno (along with folks like Eric) so we make the right decisions up-front and not as an after-thought when we hit an issue via Bigtop.
          b) We never have to play this go-around between Hadoop projects and Bigtop on an ongoing basis which invariably will lead to religious wars of opinions as noticed on this jira.

          I say all this with utmost humility - please do not interpret this in the wrong manner.

          Show
          Arun C Murthy added a comment - Eric & Bruno - I urge you both to exercise restraint as I see this becoming a religious war of opinions (I do appreciate Bruno's apology). I've been hoping we could come to a simple consensus quickly, but I'm starting to worry this is going to be an ongoing back-and-forth between Hadoop packaging and Bigtop and that both parties always leave with a bitter aftertaste. Now we've added YARN and I expect we'll have further issues in the future. For e.g. there is some chance we'll have an MPI sub-component for YARN... etc. etc. Worse, this might not be just limited to Apache Hadoop and will repeat in lots of projects in the Hadoop ecosystem. Frankly, I don't care a whole lot and I trust folks like Roman, Eric & Bruno are more qualified on packaging matters and I'd hate to have to mediate on an ongoing basis since I obviously care about having decent packaging for all the work I do here (and I'm sure other developers feel the same). It would mean I'd have to spend a lot more time caring about FHS etc. in order to argue with two sets of experts, something I can do without. However, I'm coming to the conclusion, and I say this with trepidation that I may be interpreted wrongly, that we'll never get over these while Apache Bigtop keeps packaging of Hadoop projects external to the projects themselves. (I believe this has been brought up earlier and I confess ignorance to the reasons which led to marginalizing this issue. I'm happy to be educated.) Thus, I'd strongly urge that the Apache Bigtop community to consider resolving these issues once & for all and to contribute these back to the projects for a couple of significant benefits: a) The individual projects benefit from expertise & oversight of folks like Roman & Bruno (along with folks like Eric) so we make the right decisions up-front and not as an after-thought when we hit an issue via Bigtop. b) We never have to play this go-around between Hadoop projects and Bigtop on an ongoing basis which invariably will lead to religious wars of opinions as noticed on this jira. I say all this with utmost humility - please do not interpret this in the wrong manner.
          Hide
          Eric Yang added a comment -

          it looks like you seem to agree with the proposal now barring the location of the native bq. libraries (which I'm going to modify accordingly). Does it mean you have no further

          concerns and we can move on with the implementation and eventual commit of the updated patch?

          Yes, I summarized the current features and my concerns. Look forward to the next iteration of the patch.

          Show
          Eric Yang added a comment - it looks like you seem to agree with the proposal now barring the location of the native bq. libraries (which I'm going to modify accordingly). Does it mean you have no further concerns and we can move on with the implementation and eventual commit of the updated patch? Yes, I summarized the current features and my concerns. Look forward to the next iteration of the patch.
          Hide
          Doug Cutting added a comment -

          > we'll never get over these while Apache Bigtop keeps packaging of Hadoop projects external to the projects themselves

          I think there's good case to be made for the contrary position, that we'll never have consistent packaging if we leave it to the projects themselves. 10+ independently managed projects that release on different schedules are unlikely to be able to make the coordinated changes required to deliver a consistent set of packages on a regular basis. Moreover, the logic of different distribution and packaging conventions would then be spread across multiple projects, replicating logic that might better be consolidated in BigTop. Historically packaging of open source software is primarily done downstream, not in the projects themselves, which instead primarily produce a source tarball.

          Show
          Doug Cutting added a comment - > we'll never get over these while Apache Bigtop keeps packaging of Hadoop projects external to the projects themselves I think there's good case to be made for the contrary position, that we'll never have consistent packaging if we leave it to the projects themselves. 10+ independently managed projects that release on different schedules are unlikely to be able to make the coordinated changes required to deliver a consistent set of packages on a regular basis. Moreover, the logic of different distribution and packaging conventions would then be spread across multiple projects, replicating logic that might better be consolidated in BigTop. Historically packaging of open source software is primarily done downstream, not in the projects themselves, which instead primarily produce a source tarball.
          Hide
          Doug Cutting added a comment -

          I don't want to start an upstream-versus-downstream packaging argument here. Rather my point is that there's room for both. We should not discourage or inhibit folks who wish to package downstream. I think Bruno, Roman & Eric have been making good progress towards consensus on this patch.

          Show
          Doug Cutting added a comment - I don't want to start an upstream-versus-downstream packaging argument here. Rather my point is that there's room for both. We should not discourage or inhibit folks who wish to package downstream. I think Bruno, Roman & Eric have been making good progress towards consensus on this patch.
          Hide
          Arun C Murthy added a comment -

          Thanks for the discussion Doug. I agree a long-drawn argument isn't productive. However, please, indulge me a little longer (at least for my own education smile).


          I agree that having downstream packagers is useful and very common. However, it is uncommon for downstream packagers to seek changes upstream, particularly to start/stop scripts. They typically maintain their own i.e. carry the burden of maintenance. It would not be unreasonable for Bigtop to do the same i.e. maintain their own bin/hadoop etc. Not that I would prefer this.

          Yes, historically packaging is done downstream, but not in Hadoop's case. We have had our own scripts, packaging (tarballs, rpms etc.) for a long while and we need to continue to support it for compatibility.

          Also, Bigtop is an Apache project, and is very different from a random downstream packager/distro. It seems we could do better here at the ASF by collaborating closer between the two communities in the ASF.

          OTOH, we are currently debating adding features here which Apache Hadoop will never use and then we are assuming the burden of maintenance.

          If the argument comes down to 'Hadoop scripts are a mess, there is no harm adding some more' then I have very little sympathy as much as I agree we can do better.

          Seems to me we could eat our dogfood all the time by merging the communities for the 'packaging' (alone) and reduce dead-code and increase collaboration. Clearly Bigtop is more than just packaging i.e. it does stack validation etc. which belongs in a separate project.

          My primary interest is to have as little 'dead' code in Hadoop as possible and it seems to me we are adding a fair number of variables (features) we'll never use in Hadoop. By having Bigtop contribute the packaging back to the project we could all share the burden of maintenance. Clearly, taking away features is always harder than adding them, and we should be careful to do so.

          Thus, it would be useful for folks in Apache Bigtop project to share why they feel they cannot collaborate with Apache Hadoop leading to two different implementations of packaging for Hadoop within the ASF.


          Again, I appreciate this healthy discussion.

          Show
          Arun C Murthy added a comment - Thanks for the discussion Doug. I agree a long-drawn argument isn't productive. However, please, indulge me a little longer (at least for my own education smile ). I agree that having downstream packagers is useful and very common. However, it is uncommon for downstream packagers to seek changes upstream, particularly to start/stop scripts. They typically maintain their own i.e. carry the burden of maintenance . It would not be unreasonable for Bigtop to do the same i.e. maintain their own bin/hadoop etc. Not that I would prefer this. Yes, historically packaging is done downstream, but not in Hadoop's case. We have had our own scripts, packaging (tarballs, rpms etc.) for a long while and we need to continue to support it for compatibility. Also, Bigtop is an Apache project, and is very different from a random downstream packager/distro. It seems we could do better here at the ASF by collaborating closer between the two communities in the ASF. OTOH, we are currently debating adding features here which Apache Hadoop will never use and then we are assuming the burden of maintenance. If the argument comes down to 'Hadoop scripts are a mess, there is no harm adding some more' then I have very little sympathy as much as I agree we can do better. Seems to me we could eat our dogfood all the time by merging the communities for the 'packaging' (alone) and reduce dead-code and increase collaboration. Clearly Bigtop is more than just packaging i.e. it does stack validation etc. which belongs in a separate project. My primary interest is to have as little 'dead' code in Hadoop as possible and it seems to me we are adding a fair number of variables (features) we'll never use in Hadoop. By having Bigtop contribute the packaging back to the project we could all share the burden of maintenance. Clearly, taking away features is always harder than adding them, and we should be careful to do so. Thus, it would be useful for folks in Apache Bigtop project to share why they feel they cannot collaborate with Apache Hadoop leading to two different implementations of packaging for Hadoop within the ASF. Again, I appreciate this healthy discussion.
          Hide
          Doug Cutting added a comment -

          > we are currently debating adding features here which Apache Hadoop will never use

          Many if not most of Hadoop's features are primarily used by downstream projects. BigTop will use these features, as would any other downstream packager. So lack of use of a feature within Hadoop itself does not seem like a reason to reject a feature.

          > and then we are assuming the burden of maintenance

          Do you think downstream packagers like BigTop will disappear? They are trying to maintain this stuff, right here, now. This is not a drive-by contribution.

          > it would be useful for folks in Apache Bigtop project to share why they feel they cannot collaborate with Apache Hadoop

          They are trying to collaborate with Apache Hadoop right here. They seem to reasonably believe that downstream packaging makes more sense, for the reasons I outlined above. Consider HBase, Pig & Hive, which each might like to ship a single release that's compatible with multiple releases of Hadoop, e.g, 0.20.205 and 0.23, which have different layouts. Must each of these projects duplicate the logic required to handle these different layouts? Why would you inhibit folks from trying to consolidate that logic downstream?

          Show
          Doug Cutting added a comment - > we are currently debating adding features here which Apache Hadoop will never use Many if not most of Hadoop's features are primarily used by downstream projects. BigTop will use these features, as would any other downstream packager. So lack of use of a feature within Hadoop itself does not seem like a reason to reject a feature. > and then we are assuming the burden of maintenance Do you think downstream packagers like BigTop will disappear? They are trying to maintain this stuff, right here, now. This is not a drive-by contribution. > it would be useful for folks in Apache Bigtop project to share why they feel they cannot collaborate with Apache Hadoop They are trying to collaborate with Apache Hadoop right here. They seem to reasonably believe that downstream packaging makes more sense, for the reasons I outlined above. Consider HBase, Pig & Hive, which each might like to ship a single release that's compatible with multiple releases of Hadoop, e.g, 0.20.205 and 0.23, which have different layouts. Must each of these projects duplicate the logic required to handle these different layouts? Why would you inhibit folks from trying to consolidate that logic downstream?
          Hide
          Roman Shaposhnik added a comment -

          Thanks for all the feedback. Based on it I decided to go ahead with a very simplified form of an original proposal.

          A patch that is now attached doesn't try to enforce any kind of naming convention, etc. It simply replaces all the hardcoded string constants with variables. That's it.

          Hopefully this will be small enough change to be accepted without much debate.

          Show
          Roman Shaposhnik added a comment - Thanks for all the feedback. Based on it I decided to go ahead with a very simplified form of an original proposal. A patch that is now attached doesn't try to enforce any kind of naming convention, etc. It simply replaces all the hardcoded string constants with variables. That's it. Hopefully this will be small enough change to be accepted without much debate.
          Roman Shaposhnik made changes -
          Attachment HADOOP-7939-simplified.patch.txt [ 12511779 ]
          Roman Shaposhnik made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12511779/HADOOP-7939-simplified.patch.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +0 tests included. The patch appears to be a documentation patch that doesn't require tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/529//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511779/HADOOP-7939-simplified.patch.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/529//console This message is automatically generated.
          Hide
          Roman Shaposhnik added a comment -

          Patch applies cleanly to trunk and branch-0.23. Please consider for commit.

          Show
          Roman Shaposhnik added a comment - Patch applies cleanly to trunk and branch-0.23. Please consider for commit.
          Hide
          Eric Yang added a comment -

          Is it possible to change lib/native to lib for default? lib/native was a legacy differentiation between jar files and native binaries. It seems more intuitive to use lib as default.

          Show
          Eric Yang added a comment - Is it possible to change lib/native to lib for default? lib/native was a legacy differentiation between jar files and native binaries. It seems more intuitive to use lib as default.
          Hide
          Allen Wittenauer added a comment -

          I suspect you'll need to change all the scripts that build classpaths.

          Show
          Allen Wittenauer added a comment - I suspect you'll need to change all the scripts that build classpaths.
          Hide
          Roman Shaposhnik added a comment -

          @Eric, I agree with you that changing it to lib is the right thing to do. But lets do it as a separate JIRA, especially since it will require changes to the assembly.

          Show
          Roman Shaposhnik added a comment - @Eric, I agree with you that changing it to lib is the right thing to do. But lets do it as a separate JIRA, especially since it will require changes to the assembly.
          Hide
          Eric Yang added a comment -

          @Roman, can you add a sub-task for this? It would be helpful for me to test all the changes together. Thanks

          Show
          Eric Yang added a comment - @Roman, can you add a sub-task for this? It would be helpful for me to test all the changes together. Thanks
          Hide
          Alejandro Abdelnur added a comment -

          Roman,

          Simplified patch looks good, a few comments:

          1. Can we get rid of LAYOUT infix? I'd use a postfix if the purpose is to differenciate these vars from full path vars. For example _DIR

          2. COMMON_ as VAR name seems 'too common', Can't we use HADOOP_COMMON_ for them?

          3. httpfs.sh, not sure we should use CATALINA_BASE here. What if the
          user as Tomcat being used for something else. shouldn't we use a
          HTTPFS_HOME or something like that? Or, do we even need to modify this script for BIGTOP purposes?

          Thanks

          Show
          Alejandro Abdelnur added a comment - Roman, Simplified patch looks good, a few comments: 1. Can we get rid of LAYOUT infix? I'd use a postfix if the purpose is to differenciate these vars from full path vars. For example _DIR 2. COMMON_ as VAR name seems 'too common', Can't we use HADOOP_COMMON_ for them? 3. httpfs.sh, not sure we should use CATALINA_BASE here. What if the user as Tomcat being used for something else. shouldn't we use a HTTPFS_HOME or something like that? Or, do we even need to modify this script for BIGTOP purposes? Thanks
          Hide
          Alejandro Abdelnur added a comment -

          forgot to mentioned (told you offline) it seems there is a typo in the WEBAPPS value, either for common or for hdfs/yarn

          Show
          Alejandro Abdelnur added a comment - forgot to mentioned (told you offline) it seems there is a typo in the WEBAPPS value, either for common or for hdfs/yarn
          Roman Shaposhnik made changes -
          Attachment HADOOP-7939-simplified-2.patch.txt [ 12512062 ]
          Hide
          Roman Shaposhnik added a comment -

          @Alejandro,

          thanks for the feedback. I took care of #1 and #2. I left #3 as is since I think it is more consistent with how CATALINA_BASE is set in httpfs-config.sh

          Please let me know if you have any other concerns.

          Show
          Roman Shaposhnik added a comment - @Alejandro, thanks for the feedback. I took care of #1 and #2. I left #3 as is since I think it is more consistent with how CATALINA_BASE is set in httpfs-config.sh Please let me know if you have any other concerns.
          Hide
          Alejandro Abdelnur added a comment -

          Roman, the last couple of things:

          • The _JARS_DIR prefix is misleading as it is overloaded being used for the component JARs and the component webapps. I think that _DIR would be a better suited name.
          • Common webapp handling is adding HADOOP_COMMON_DIR/webapps to the classpath, it should add HADOOP_COMMON_DIR as the pattern in hdfs and yarn/mr is to add the dir containing the webapps dir in the classpath. I know this value was already wrong before, but now that you are at it, would you please fix it (and we save a JIRA number)

          Thanks

          Show
          Alejandro Abdelnur added a comment - Roman, the last couple of things: The _JARS_DIR prefix is misleading as it is overloaded being used for the component JARs and the component webapps. I think that _DIR would be a better suited name. Common webapp handling is adding HADOOP_COMMON_DIR/webapps to the classpath, it should add HADOOP_COMMON_DIR as the pattern in hdfs and yarn/mr is to add the dir containing the webapps dir in the classpath. I know this value was already wrong before, but now that you are at it, would you please fix it (and we save a JIRA number) Thanks
          Hide
          Roman Shaposhnik added a comment -

          Patch updated

          Show
          Roman Shaposhnik added a comment - Patch updated
          Roman Shaposhnik made changes -
          Attachment HADOOP-7939-simplified-3.patch.txt [ 12512064 ]
          Hide
          Alejandro Abdelnur added a comment -

          +1. thanks Roman, I've tested the patch and everything is working as expected.

          While reviewing the patch I've seen a few things it should be fixed in the scripts, but that is another JIRA.

          Show
          Alejandro Abdelnur added a comment - +1. thanks Roman, I've tested the patch and everything is working as expected. While reviewing the patch I've seen a few things it should be fixed in the scripts, but that is another JIRA.
          Hide
          Alejandro Abdelnur added a comment -

          JIRA to following up on improvements to mapred/yarn scripts MAPREDUCE-3745

          Show
          Alejandro Abdelnur added a comment - JIRA to following up on improvements to mapred/yarn scripts MAPREDUCE-3745
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #1679 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1679/)
          HADOOP-7939. Improve Hadoop subcomponent integration in Hadoop 0.23. (rvs via tucu)

          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236929
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/bin/mapred
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1679 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1679/ ) HADOOP-7939 . Improve Hadoop subcomponent integration in Hadoop 0.23. (rvs via tucu) tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236929 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh /hadoop/common/trunk/hadoop-mapreduce-project/bin/mapred /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #1607 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1607/)
          HADOOP-7939. Improve Hadoop subcomponent integration in Hadoop 0.23. (rvs via tucu)

          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236929
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/bin/mapred
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1607 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1607/ ) HADOOP-7939 . Improve Hadoop subcomponent integration in Hadoop 0.23. (rvs via tucu) tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236929 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh /hadoop/common/trunk/hadoop-mapreduce-project/bin/mapred /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Hide
          Alejandro Abdelnur added a comment -

          Thanks Roman. I've committed it to trunk and branch-0.23

          Show
          Alejandro Abdelnur added a comment - Thanks Roman. I've committed it to trunk and branch-0.23
          Alejandro Abdelnur made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Commit #428 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/428/)
          Merge -r 1236928:1236929 from trunk to branch. FIXES: HADOOP-7939

          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236934
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin/mapred
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Commit #428 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/428/ ) Merge -r 1236928:1236929 from trunk to branch. FIXES: HADOOP-7939 tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236934 Files : /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin/mapred /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-0.23-Commit #437 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/437/)
          Merge -r 1236928:1236929 from trunk to branch. FIXES: HADOOP-7939

          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236934
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin/mapred
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Show
          Hudson added a comment - Integrated in Hadoop-Common-0.23-Commit #437 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/437/ ) Merge -r 1236928:1236929 from trunk to branch. FIXES: HADOOP-7939 tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236934 Files : /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin/mapred /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #1623 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1623/)
          HADOOP-7939. Improve Hadoop subcomponent integration in Hadoop 0.23. (rvs via tucu)

          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236929
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/bin/mapred
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1623 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1623/ ) HADOOP-7939 . Improve Hadoop subcomponent integration in Hadoop 0.23. (rvs via tucu) tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236929 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh /hadoop/common/trunk/hadoop-mapreduce-project/bin/mapred /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-0.23-Commit #453 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/453/)
          Merge -r 1236928:1236929 from trunk to branch. FIXES: HADOOP-7939

          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236934
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin/mapred
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Commit #453 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/453/ ) Merge -r 1236928:1236929 from trunk to branch. FIXES: HADOOP-7939 tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236934 Files : /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin/mapred /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #939 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/939/)
          HADOOP-7939. Improve Hadoop subcomponent integration in Hadoop 0.23. (rvs via tucu)

          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236929
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/bin/mapred
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #939 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/939/ ) HADOOP-7939 . Improve Hadoop subcomponent integration in Hadoop 0.23. (rvs via tucu) tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236929 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh /hadoop/common/trunk/hadoop-mapreduce-project/bin/mapred /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-0.23-Build #174 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/174/)
          Merge -r 1236928:1236929 from trunk to branch. FIXES: HADOOP-7939

          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236934
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin/mapred
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Build #174 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/174/ ) Merge -r 1236928:1236929 from trunk to branch. FIXES: HADOOP-7939 tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236934 Files : /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin/mapred /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #152 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/152/)
          Merge -r 1236928:1236929 from trunk to branch. FIXES: HADOOP-7939

          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236934
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin/mapred
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #152 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/152/ ) Merge -r 1236928:1236929 from trunk to branch. FIXES: HADOOP-7939 tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236934 Files : /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin/mapred /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #972 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/972/)
          HADOOP-7939. Improve Hadoop subcomponent integration in Hadoop 0.23. (rvs via tucu)

          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236929
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh
          • /hadoop/common/trunk/hadoop-mapreduce-project/bin/mapred
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #972 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/972/ ) HADOOP-7939 . Improve Hadoop subcomponent integration in Hadoop 0.23. (rvs via tucu) tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236929 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/sbin/httpfs.sh /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs-config.sh /hadoop/common/trunk/hadoop-mapreduce-project/bin/mapred /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-config.sh
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Harsh J made changes -
          Link This issue is duplicated by HADOOP-9878 [ HADOOP-9878 ]

            People

            • Assignee:
              Roman Shaposhnik
              Reporter:
              Roman Shaposhnik
            • Votes:
              2 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development