Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15019

Hadoop shell script classpath de-duping ignores HADOOP_USER_CLASSPATH_FIRST

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • bin
    • None

    Description

      If a user sets HADOOP_USER_CLASSPATH_FIRST=true and furthermore includes a directory that's already in Hadoop's classpath via HADOOP_CLASSPATH, that directory will appear later than it should in the eventual $CLASSPATH. I believe this is because the de-duping at https://github.com/apache/hadoop/blob/cbc632d9abf08c56a7fc02be51b2718af30bad28/hadoop-common-project/hadoop-common/src/main/bin/hadoop-functions.sh#L1200 is ignoring the "before/after" parameter.

      My way of reproduction, first build the following trivial Java program:

      $cat Test.java
      public class Test {
        public static void main(String[]args) {
          System.out.println(System.getenv().get("CLASSPATH"));
        }
      }
      $javac Test.java
      $jar cf test.jar Test.class
      

      With that, if you happen to have an entry in HADOOP_CLASSPATH that matches what Hadoop would produce, you'll find the ordering not honored. It's easiest to reproduce this with a match for HADOOP_CONF_DIR, as in the second case below:

      # As you'd expect, /usr/share is first!
      $HADOOP_CONF_DIR=/etc HADOOP_USER_CLASSPATH_FIRST="true" HADOOP_CLASSPATH=/usr/share:/tmp:/bin bin/hadoop jar test.jar Test | tr ':' '\n' | grep -n . | grep '/usr/share'
      WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
      1:/usr/share
      
      # Surprise! /usr/share is now in the 3rd line, even thought it was first in HADOOP_CLASSPATH.
      $HADOOP_CONF_DIR=/usr/share HADOOP_USER_CLASSPATH_FIRST="true" HADOOP_CLASSPATH=/usr/share:/tmp:/bin bin/hadoop jar test.jar Test | tr ':' '\n' | grep -n . | grep '/usr/share'
      WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
      3:/usr/share
      

      To re-iterate, what's surprising is that you can make an entry that's first in HADOOP_USER_CLASSPATH show up not first in the resulting classpath.

      I ran into this configuring bin/hive with a confdir that was being used for both HDFS and Hive, and flailing as to why my log4j2.properties wasn't being read. The one in my conf dir was lower in my classpath than one bundled in some Hive jar.

      Attachments

        Activity

          People

            Unassigned Unassigned
            philip Philip Martin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: