Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha1
    • Fix Version/s: 3.0.0-alpha1
    • Component/s: scripts
    • Labels:
      None

      Description

      (Updated from original description)

      There are various places where the various HOME directories are missing or mis-defined.

      1. HADOOP-10996.patch
        2 kB
        Allen Wittenauer
      2. HADOOP-10996-01.patch
        4 kB
        Allen Wittenauer
      3. HADOOP-10996-02.patch
        7 kB
        Allen Wittenauer

        Issue Links

          Activity

          Hide
          aw Allen Wittenauer added a comment - - edited

          (Source: https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12665400&commentId=14106278)

          Given this:

          $ export HADOOP_COMMON_HOME=$(pwd)/$(ls -d hadoop-common-project/hadoop-common/target/hadoop-common-*/)
          $ export HADOOP_HDFS_HOME=$(pwd)/$(ls -d hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-*/)
          $ export PATH=$HADOOP_COMMON_HOME/bin:$HADOOP_HDFS_HOME/bin:$PATH
          $ hdfs
          ERROR: Unable to exec (path)target/hadoop-hdfs-3.0.0-SNAPSHOT/bin/../libexec/hadoop-functions.sh.
          

          How do we make hdfs work properly?

          First, what is happening?

          The code tries to find where to look for hdfs-config.sh is located. It does this by looking for ../libexec, where it finds it. It now makes the (false) assumption that this must be the one, true libexec dir. So it now tries to fire up hadoop-config.sh and hadoop-functions.sh which fail.

          There are a couple of different ways to solve this:

          • Look to see if HADOOP_COMMON_HOME is defined and look for hadoop-config.sh/hadoop-functions.sh is there as well.
          • Throw caution to the wind and see if this stuff is in our current path.
          • Recalculate HADOOP_LIBEXEC_DIR in hadoop-config.sh might work too, since clearly hdfs found it.
          • Do the full gamut of checks for HADOOP_HDFS_HOME, etc, for hdfs-config.sh + some of the stuff above.

          One sticking point is what happens if hadoop-layout.sh redefines the directory structure? The code is sort of in a catch-22.

          Show
          aw Allen Wittenauer added a comment - - edited (Source: https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12665400&commentId=14106278 ) Given this: $ export HADOOP_COMMON_HOME=$(pwd)/$(ls -d hadoop-common-project/hadoop-common/target/hadoop-common-*/) $ export HADOOP_HDFS_HOME=$(pwd)/$(ls -d hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-*/) $ export PATH=$HADOOP_COMMON_HOME/bin:$HADOOP_HDFS_HOME/bin:$PATH $ hdfs ERROR: Unable to exec (path)target/hadoop-hdfs-3.0.0-SNAPSHOT/bin/../libexec/hadoop-functions.sh. How do we make hdfs work properly? First, what is happening? The code tries to find where to look for hdfs-config.sh is located. It does this by looking for ../libexec, where it finds it. It now makes the (false) assumption that this must be the one, true libexec dir. So it now tries to fire up hadoop-config.sh and hadoop-functions.sh which fail. There are a couple of different ways to solve this: Look to see if HADOOP_COMMON_HOME is defined and look for hadoop-config.sh/hadoop-functions.sh is there as well. Throw caution to the wind and see if this stuff is in our current path. Recalculate HADOOP_LIBEXEC_DIR in hadoop-config.sh might work too, since clearly hdfs found it. Do the full gamut of checks for HADOOP_HDFS_HOME, etc, for hdfs-config.sh + some of the stuff above. One sticking point is what happens if hadoop-layout.sh redefines the directory structure? The code is sort of in a catch-22.
          Hide
          aw Allen Wittenauer added a comment - - edited

          OK, hdfs-config.sh does to the right thing (although it could be argued the order should be reversed):

          if [ -e "${HADOOP_LIBEXEC_DIR}/hadoop-config.sh" ]; then
            . "${HADOOP_LIBEXEC_DIR}/hadoop-config.sh"
          elif [ -e "${HADOOP_COMMON_HOME}/libexec/hadoop-config.sh" ]; then
            . "${HADOOP_COMMON_HOME}/libexec/hadoop-config.sh"
          elif [ -e "${HADOOP_HOME}/libexec/hadoop-config.sh" ]; then
            . "${HADOOP_HOME}/libexec/hadoop-config.sh"
          else
            echo "ERROR: Hadoop common not found." 2>&1
            exit 1
          fi
          

          So it's really hadoop-config.sh that's broken here:

          # get our functions defined for usage later
          if [[ -f "${HADOOP_LIBEXEC_DIR}/hadoop-functions.sh" ]]; then
            . "${HADOOP_LIBEXEC_DIR}/hadoop-functions.sh"
          else
            echo "ERROR: Unable to exec ${HADOOP_LIBEXEC_DIR}/hadoop-functions.sh." 1>&2
            exit 1
          fi
          
          # allow overrides of the above and pre-defines of the below
          if [[ -f "${HADOOP_LIBEXEC_DIR}/hadoop-layout.sh" ]]; then
            . "${HADOOP_LIBEXEC_DIR}/hadoop-layout.sh"
          fi
          

          This is going to be a relatively easy fix, I think. We just need to add checks for HADOOP_COMMON_HOME prior to using HADOOP_LIBEXEC_DIR.

          Show
          aw Allen Wittenauer added a comment - - edited OK, hdfs-config.sh does to the right thing (although it could be argued the order should be reversed): if [ -e "${HADOOP_LIBEXEC_DIR}/hadoop-config.sh" ]; then . "${HADOOP_LIBEXEC_DIR}/hadoop-config.sh" elif [ -e "${HADOOP_COMMON_HOME}/libexec/hadoop-config.sh" ]; then . "${HADOOP_COMMON_HOME}/libexec/hadoop-config.sh" elif [ -e "${HADOOP_HOME}/libexec/hadoop-config.sh" ]; then . "${HADOOP_HOME}/libexec/hadoop-config.sh" else echo "ERROR: Hadoop common not found." 2>&1 exit 1 fi So it's really hadoop-config.sh that's broken here: # get our functions defined for usage later if [[ -f "${HADOOP_LIBEXEC_DIR}/hadoop-functions.sh" ]]; then . "${HADOOP_LIBEXEC_DIR}/hadoop-functions.sh" else echo "ERROR: Unable to exec ${HADOOP_LIBEXEC_DIR}/hadoop-functions.sh." 1>&2 exit 1 fi # allow overrides of the above and pre-defines of the below if [[ -f "${HADOOP_LIBEXEC_DIR}/hadoop-layout.sh" ]]; then . "${HADOOP_LIBEXEC_DIR}/hadoop-layout.sh" fi This is going to be a relatively easy fix, I think. We just need to add checks for HADOOP_COMMON_HOME prior to using HADOOP_LIBEXEC_DIR.
          Hide
          aw Allen Wittenauer added a comment -

          Patch that fixes hadoop-config.sh to use HADOOP_COMMON_HOME/libexec if it can't find it in HADOOP_LIBEXEC_DIR as well as fixes two bugs in HADOOP_HDFS_HOME and HADOOP_MAPRED_HOME definitions when they aren't defined.

          Show
          aw Allen Wittenauer added a comment - Patch that fixes hadoop-config.sh to use HADOOP_COMMON_HOME/libexec if it can't find it in HADOOP_LIBEXEC_DIR as well as fixes two bugs in HADOOP_HDFS_HOME and HADOOP_MAPRED_HOME definitions when they aren't defined.
          Hide
          aw Allen Wittenauer added a comment -

          FWIW, I opted to reverse the order because I remembered why I did it in the other code as well: in NORMAL operating modes, HADOOP_LIBEXEC_DIR is the correct place.

          Show
          aw Allen Wittenauer added a comment - FWIW, I opted to reverse the order because I remembered why I did it in the other code as well: in NORMAL operating modes, HADOOP_LIBEXEC_DIR is the correct place.
          Hide
          aw Allen Wittenauer added a comment -

          -01: Wait... wait... wait... We should NOT be using HADOOP_HOME for anything! So let's fix that too.

          Show
          aw Allen Wittenauer added a comment - -01: Wait... wait... wait... We should NOT be using HADOOP_HOME for anything! So let's fix that too.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12663573/HADOOP-10996-01.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.ha.TestZKFailoverController
          org.apache.hadoop.ha.TestZKFailoverControllerStress
          org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/4533//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/4533//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663573/HADOOP-10996-01.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.ha.TestZKFailoverController org.apache.hadoop.ha.TestZKFailoverControllerStress org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/4533//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/4533//console This message is automatically generated.
          Hide
          aw Allen Wittenauer added a comment - - edited

          Cancelling patch -01.

          After working with it, I've found some edge and not-so-edge cases that either:

          a) are made worse (e.g., usage of *_HOME should be viewed as exceptions to _PREFIX, not as an all or nothing scenario)
          b) aren't covered (e.g., etc/hadoop/*-site.xml comes from *_HOME)

          Show
          aw Allen Wittenauer added a comment - - edited Cancelling patch -01. After working with it, I've found some edge and not-so-edge cases that either: a) are made worse (e.g., usage of *_HOME should be viewed as exceptions to _PREFIX, not as an all or nothing scenario) b) aren't covered (e.g., etc/hadoop/*-site.xml comes from *_HOME)
          Hide
          aw Allen Wittenauer added a comment -

          -02:

          Changed up the order. Reduces the amount of stat calls needed by checking if some of the *_HOME vars are defined.

          I started to poke around at enabling *_HOME/etc/hadoop (or whatever), but decided it probably isn't worth it since it will likely lead to unpredictable results.

          Andrew Wang, please try this out and see if it fixes your specific issue. Thanks!

          Show
          aw Allen Wittenauer added a comment - -02: Changed up the order. Reduces the amount of stat calls needed by checking if some of the *_HOME vars are defined. I started to poke around at enabling *_HOME/etc/hadoop (or whatever), but decided it probably isn't worth it since it will likely lead to unpredictable results. Andrew Wang , please try this out and see if it fixes your specific issue. Thanks!
          Hide
          andrew.wang Andrew Wang added a comment -

          Hey Allen, thanks for working on this. What's your recommendation on where to put config files? Right now I do something like:

          <above stuff>
          cp -r ~/configs/* $HADOOP_HDFS_HOME/etc/hadoop/
          hdfs namenode
          

          This still works, but your comments about *_HOME/etc/hadoop being unpredictable made me wonder.

          +1 regardless though, thanks again.

          Show
          andrew.wang Andrew Wang added a comment - Hey Allen, thanks for working on this. What's your recommendation on where to put config files? Right now I do something like: <above stuff> cp -r ~/configs/* $HADOOP_HDFS_HOME/etc/hadoop/ hdfs namenode This still works, but your comments about *_HOME/etc/hadoop being unpredictable made me wonder. +1 regardless though, thanks again.
          Hide
          aw Allen Wittenauer added a comment - - edited

          TL;DR: Absolute best bet is to put configs some place and assign HADOOP_CONF_DIR to it so that you have absolute certainty on where Hadoop is pulling settings.

          Longer story:

          Currently, if HADOOP_CONF_DIR isn't defined, it uses a bit of interesting logic to locate it:

          1. Figure out where HADOOP_PREFIX is at. Is HADOOP_PREFIX defined? If not, then let's assume it's "what's called us/..".
          2. Does HADOOP_PREFIX/conf/hadoop-env.sh exist? OK, then that must be HADOOP_CONF_DIR
          3. No? OK, then HADOOP_CONF_DIR must be HADOOP_PREFIX/etc/hadoop.

          What's fun about this and what you're doing is that HADOOP_CONF_DIR will get defined differently depending upon which bin dir you are using.

          Fine, you say! Let's just treat all _HOME/etc/hadoop and _HOME/conf as potentially valid. Now we have a very interesting problem: how do you define HADOOP_CONF_DIR? Other stuff past Hadoop depends upon this being one directory. We could pick the first one and then just shove the rest in the classpath and none would be the wiser!

          Aha! But they would. Which one takes precedence? What happens if there are conflicts? etc, etc. It gets messy very very fast. So... ABORT! ABORT!

          (BTW, this is pretty much the same logic from branch-2. It could be argued that there should be a check to see if etc/hadoop is 'real' too and abort on it. Here's the fun part: the shell code works perfectly fine if -env.sh is empty now... the NN will still crash though. That said, if HADOOP-10879 gets finished, this will almost certainly need to get revisited. Probably better to look for core-site.xml, honestly, since all of the sub-projects all depend upon that. In other words, we could run through all of the _HOME, HADOOP_PREFIX, etc, and use the first core-site.xml we find as the 'real' HADOOP_CONF_DIR.)

          Show
          aw Allen Wittenauer added a comment - - edited TL;DR: Absolute best bet is to put configs some place and assign HADOOP_CONF_DIR to it so that you have absolute certainty on where Hadoop is pulling settings. Longer story: Currently, if HADOOP_CONF_DIR isn't defined, it uses a bit of interesting logic to locate it: 1. Figure out where HADOOP_PREFIX is at. Is HADOOP_PREFIX defined? If not, then let's assume it's "what's called us/..". 2. Does HADOOP_PREFIX/conf/hadoop-env.sh exist? OK, then that must be HADOOP_CONF_DIR 3. No? OK, then HADOOP_CONF_DIR must be HADOOP_PREFIX/etc/hadoop. What's fun about this and what you're doing is that HADOOP_CONF_DIR will get defined differently depending upon which bin dir you are using. Fine, you say! Let's just treat all _HOME/etc/hadoop and _HOME/conf as potentially valid. Now we have a very interesting problem: how do you define HADOOP_CONF_DIR? Other stuff past Hadoop depends upon this being one directory. We could pick the first one and then just shove the rest in the classpath and none would be the wiser! Aha! But they would. Which one takes precedence? What happens if there are conflicts? etc, etc. It gets messy very very fast. So... ABORT! ABORT! (BTW, this is pretty much the same logic from branch-2. It could be argued that there should be a check to see if etc/hadoop is 'real' too and abort on it. Here's the fun part: the shell code works perfectly fine if -env.sh is empty now... the NN will still crash though. That said, if HADOOP-10879 gets finished, this will almost certainly need to get revisited. Probably better to look for core-site.xml, honestly, since all of the sub-projects all depend upon that. In other words, we could run through all of the _HOME, HADOOP_PREFIX, etc, and use the first core-site.xml we find as the 'real' HADOOP_CONF_DIR.)
          Hide
          aw Allen Wittenauer added a comment -

          Thanks! I'll commit this as soon as the git repo opens up!

          Show
          aw Allen Wittenauer added a comment - Thanks! I'll commit this as soon as the git repo opens up!
          Hide
          andrew.wang Andrew Wang added a comment -

          Cool, thanks for the explanation will do that in the future.

          Show
          andrew.wang Andrew Wang added a comment - Cool, thanks for the explanation will do that in the future.

            People

            • Assignee:
              aw Allen Wittenauer
              Reporter:
              aw Allen Wittenauer
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development