Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.13.0
    • Component/s: scripts
    • Labels:
      None

      Description

      The bin/hadoop script distributed with hadoop clobbers the user's CLASSPATH. This prevents ad-hoc appending to the CLASSPATH.

      1. HADOOP-1114.patch
        1 kB
        Doug Cutting
      2. hadoop-no-clobber-classpath.patch
        0.4 kB
        Michael Bieniosek

        Activity

        Hide
        Hadoop QA added a comment -
        Show
        Hadoop QA added a comment - Integrated in Hadoop-Nightly #55 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/55/ )
        Hide
        Hadoop QA added a comment -

        -1, could not apply patch.

        The patch command could not apply the latest attachment http://issues.apache.org/jira/secure/attachment/12355371/HADOOP-1114.patch as a patch to trunk revision r527711.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/30/testReport/
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/30/console

        Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

        Show
        Hadoop QA added a comment - -1, could not apply patch. The patch command could not apply the latest attachment http://issues.apache.org/jira/secure/attachment/12355371/HADOOP-1114.patch as a patch to trunk revision r527711. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/30/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/30/console Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.
        Hide
        Doug Cutting added a comment -

        I just committed this.

        Show
        Doug Cutting added a comment - I just committed this.
        Hide
        Michael Bieniosek added a comment -

        Yes, that works for me.

        Show
        Michael Bieniosek added a comment - Yes, that works for me.
        Hide
        Doug Cutting added a comment -

        Here's a patch that implements HADOOP_CLASSPATH.

        Michael, is this acceptable to you?

        Show
        Doug Cutting added a comment - Here's a patch that implements HADOOP_CLASSPATH. Michael, is this acceptable to you?
        Hide
        Tom White added a comment -

        I would vote for HADOOP_CLASSPATH over plain CLASSPATH. This solves Michael's problem, while avoiding hard-to-diagnose support issues that might arise in the future from junk on the standard CLASSPATH.

        Show
        Tom White added a comment - I would vote for HADOOP_CLASSPATH over plain CLASSPATH. This solves Michael's problem, while avoiding hard-to-diagnose support issues that might arise in the future from junk on the standard CLASSPATH.
        Hide
        Doug Cutting added a comment -

        > I accept that using CLASSPATH for dependencies is fragile and hacky, I'm not sure why HADOOP_CLASSPATH is any different.

        It's a bit less fragile, since it won't conflict with every other Java application in the world, only with other Hadoop-based applications. That's all. For example, I've seen Windows installers that set CLASSPATH to include lots of stuff that we wouldn't want on Hadoop's class path.

        Show
        Doug Cutting added a comment - > I accept that using CLASSPATH for dependencies is fragile and hacky, I'm not sure why HADOOP_CLASSPATH is any different. It's a bit less fragile, since it won't conflict with every other Java application in the world, only with other Hadoop-based applications. That's all. For example, I've seen Windows installers that set CLASSPATH to include lots of stuff that we wouldn't want on Hadoop's class path.
        Hide
        Michael Bieniosek added a comment -

        While I accept that using CLASSPATH for dependencies is fragile and hacky, I'm not sure why HADOOP_CLASSPATH is any different. It seems both environment variables are equally abusable. Using CLASSPATH is much more intuitive than the nonstandard HADOOP_CLASSPATH. The only problem that HADOOP_CLASSPATH seems to solve is that a caller script might leave inherited garbage in CLASSPATH before adding more jars. But that should be the responsibility of the caller script, I think.

        Show
        Michael Bieniosek added a comment - While I accept that using CLASSPATH for dependencies is fragile and hacky, I'm not sure why HADOOP_CLASSPATH is any different. It seems both environment variables are equally abusable. Using CLASSPATH is much more intuitive than the nonstandard HADOOP_CLASSPATH. The only problem that HADOOP_CLASSPATH seems to solve is that a caller script might leave inherited garbage in CLASSPATH before adding more jars. But that should be the responsibility of the caller script, I think.
        Hide
        Doug Cutting added a comment -

        > Currently our hadoop jars also depend on non-java code which can't be distributed via jar.

        Job jar files are unpacked in the directory where the job jvm is run, so non-java data and code in the jar file can be easily accessed via relative paths.

        > we would just need to write a combine-jar tool

        Ant has good tools for constructing jar files.

        In any case, if you like to install all your software in a separate directory, and want parts of it to appear on the classpath of the Hadoop daemons, that's fine. I just think our scripts should not rely on the overused CLASSPATH environment variable, but would be less fragile if they use HADOOP_CLASSPATH.

        Show
        Doug Cutting added a comment - > Currently our hadoop jars also depend on non-java code which can't be distributed via jar. Job jar files are unpacked in the directory where the job jvm is run, so non-java data and code in the jar file can be easily accessed via relative paths. > we would just need to write a combine-jar tool Ant has good tools for constructing jar files. In any case, if you like to install all your software in a separate directory, and want parts of it to appear on the classpath of the Hadoop daemons, that's fine. I just think our scripts should not rely on the overused CLASSPATH environment variable, but would be less fragile if they use HADOOP_CLASSPATH.
        Hide
        Michael Bieniosek added a comment -

        We have a separate mechanism to distribute generic software; this is what we currently use to distribute jar dependencies. Currently our hadoop jars also depend on non-java code which can't be distributed via jar. So we could package all our java code into a single jar (we would just need to write a combine-jar tool). But this wouldn't solve our problem, because some of our code wouldn't go in the jar and would still have to be distributed through a different mechanism. I'd rather be supporting one software distribution system than two.

        Show
        Michael Bieniosek added a comment - We have a separate mechanism to distribute generic software; this is what we currently use to distribute jar dependencies. Currently our hadoop jars also depend on non-java code which can't be distributed via jar. So we could package all our java code into a single jar (we would just need to write a combine-jar tool). But this wouldn't solve our problem, because some of our code wouldn't go in the jar and would still have to be distributed through a different mechanism. I'd rather be supporting one software distribution system than two.
        Hide
        Doug Cutting added a comment -

        > We could package all the dependent code into the same jar, but that seems unnecessary.

        Why is this unnecessary? That's the intended mechanism. Job jars may include a lib/ directory with other jar files, just like a war file. Packaging things this way makes it much easier to update your code without restarting Hadoop daemons.

        Show
        Doug Cutting added a comment - > We could package all the dependent code into the same jar, but that seems unnecessary. Why is this unnecessary? That's the intended mechanism. Job jars may include a lib/ directory with other jar files, just like a war file. Packaging things this way makes it much easier to update your code without restarting Hadoop daemons.
        Hide
        Michael Bieniosek added a comment -

        Currently, the hadoop runjar commands takes a single user jar as argument. However, our jar depends on other (custom) java libraries. So we get around this by this CLASSPATH hack. We could drop dependent jars into the hadoop lib file, but I'd rather not mix shipped hadoop code with our user code.

        We could package all the dependent code into the same jar, but that seems unnecessary. A better alternative might be to set a CLASSPATH in the jar manifest, but I haven't thought very much about how that would work.

        So unless there is another better yet simple method, we need some way to insert jar dependencies into hadoop. Your example of Tomcat is a bit different, since there is a well-defined mechanism for getting dependent jars into servlet containers (using the WEB-INF/lib directory).

        It doesn't seem very worthwhile to rename CLASSPATH to HADOOP_CLASSPATH.

        Show
        Michael Bieniosek added a comment - Currently, the hadoop runjar commands takes a single user jar as argument. However, our jar depends on other (custom) java libraries. So we get around this by this CLASSPATH hack. We could drop dependent jars into the hadoop lib file, but I'd rather not mix shipped hadoop code with our user code. We could package all the dependent code into the same jar, but that seems unnecessary. A better alternative might be to set a CLASSPATH in the jar manifest, but I haven't thought very much about how that would work. So unless there is another better yet simple method, we need some way to insert jar dependencies into hadoop. Your example of Tomcat is a bit different, since there is a well-defined mechanism for getting dependent jars into servlet containers (using the WEB-INF/lib directory). It doesn't seem very worthwhile to rename CLASSPATH to HADOOP_CLASSPATH.
        Hide
        Doug Cutting added a comment -

        Sorry. I'm coming to this rather late. But, for what it's worth, the prior behavior was intentional. The CLASSPATH environment variable is fragile to use. Folks can end up with lots of crazy stuff on it (e.g., conflicting versions of libraries) that can break things in confusing ways. Thus it's generally better to not rely on it.

        Sun subtly discourages using the CLASSPATH environment variable:

        http://java.sun.com/j2se/1.5.0/docs/tooldocs/windows/classpath.html#env%20var

        Tomcat's startup scripts erase any pre-existing CLASSPATH values:

        http://svn.apache.org/repos/asf/tomcat/container/tc5.5.x/catalina/src/bin/setclasspath.sh

        However, Ant does respect it:

        http://svn.apache.org/viewvc/ant/core/trunk/src/script/ant?view=markup

        To add things to Hadoop's classpath one can simply add files to Hadoop's lib directory (as folks typically add junit's jar to ant's lib). If that's insufficient, I'd rather we add a HADOOP_CLASSPATH environment variable than use the fragile, global CLASSPATH.

        What do others think?

        Show
        Doug Cutting added a comment - Sorry. I'm coming to this rather late. But, for what it's worth, the prior behavior was intentional. The CLASSPATH environment variable is fragile to use. Folks can end up with lots of crazy stuff on it (e.g., conflicting versions of libraries) that can break things in confusing ways. Thus it's generally better to not rely on it. Sun subtly discourages using the CLASSPATH environment variable: http://java.sun.com/j2se/1.5.0/docs/tooldocs/windows/classpath.html#env%20var Tomcat's startup scripts erase any pre-existing CLASSPATH values: http://svn.apache.org/repos/asf/tomcat/container/tc5.5.x/catalina/src/bin/setclasspath.sh However, Ant does respect it: http://svn.apache.org/viewvc/ant/core/trunk/src/script/ant?view=markup To add things to Hadoop's classpath one can simply add files to Hadoop's lib directory (as folks typically add junit's jar to ant's lib). If that's insufficient, I'd rather we add a HADOOP_CLASSPATH environment variable than use the fragile, global CLASSPATH. What do others think?
        Hide
        Hadoop QA added a comment -
        Show
        Hadoop QA added a comment - Integrated in Hadoop-Nightly #53 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/53/ )
        Hide
        Tom White added a comment -

        Yes, I just tried it and this change works when the user's classpath variable is unset.

        I've just committed this. Thanks Michael!

        Show
        Tom White added a comment - Yes, I just tried it and this change works when the user's classpath variable is unset. I've just committed this. Thanks Michael!
        Hide
        Michael Bieniosek added a comment -

        I don't think it matters; if you set your classpath to ::::::::::/somejar.jar I think java will still do the right thing.

        Show
        Michael Bieniosek added a comment - I don't think it matters; if you set your classpath to ::::::::::/somejar.jar I think java will still do the right thing.
        Hide
        Tom White added a comment -

        Should this check to see if CLASSPATH has been previously set before including it in the new CLASSPATH?

        Show
        Tom White added a comment - Should this check to see if CLASSPATH has been previously set before including it in the new CLASSPATH?
        Hide
        Hadoop QA added a comment -

        -1, because the patch command could not apply the latest attachment http://issues.apache.org as a patch to trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/525505. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

        Show
        Hadoop QA added a comment - -1, because the patch command could not apply the latest attachment http://issues.apache.org as a patch to trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/525505 . Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
        Hide
        Michael Bieniosek added a comment -

        Here's my proposed patch:

        — bin/hadoop.orig
        +++ bin/hadoop
        @@ -74,7 +74,8 @@
        fi

        1. CLASSPATH initially contains $HADOOP_CONF_DIR
          -CLASSPATH="$ {HADOOP_CONF_DIR}"
          +# respect previously set CLASSPATH
          +CLASSPATH="$CLASSPATH:${HADOOP_CONF_DIR}

          "
          CLASSPATH=$

          {CLASSPATH}

          :$JAVA_HOME/lib/tools.jar

        1. for developers, add Hadoop classes to CLASSPATH
        Show
        Michael Bieniosek added a comment - Here's my proposed patch: — bin/hadoop.orig +++ bin/hadoop @@ -74,7 +74,8 @@ fi CLASSPATH initially contains $HADOOP_CONF_DIR -CLASSPATH="$ {HADOOP_CONF_DIR}" +# respect previously set CLASSPATH +CLASSPATH="$CLASSPATH:${HADOOP_CONF_DIR} " CLASSPATH=$ {CLASSPATH} :$JAVA_HOME/lib/tools.jar for developers, add Hadoop classes to CLASSPATH

          People

          • Assignee:
            Doug Cutting
            Reporter:
            Michael Bieniosek
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development