Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-6997

'hadoop' script should set LANG or LC_COLLATE explicitly for classpath order

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 0.20.2, 0.21.0, 0.22.0
    • None
    • scripts
    • None

    Description

      The 'hadoop' script builds the classpath in pieces, including the following bit for the bulk of it:

      # add libs to CLASSPATH
      for f in $HADOOP_HOME/lib/*.jar; do
        CLASSPATH=${CLASSPATH}:$f;
      done
      

      The ordering of "*.jar", i.e., the collation order, depends on either LANG or LC_COLLATE on Linux systems. In the absence of either one, the script will default to whatever the user's environment specifies; for Red Hat, the default is "en_US", which is a case-insensitive (and punctuation-insensitive?) ordering. If LANG is set to "C" instead, the ordering changes to the ASCII/UTF-8 byte ordering.

      The key issue here is that $HADOOP_HOME/lib contains both upper- and lowercase jar names (e.g., "SimonTool.jar" and "commons-logging-1.1.1.jar", to pick a completely random pair), which will have an inverted order depending on which setting is used.

      'hadoop' should explicitly set LANG and/or LC_COLLATE to whatever setting it's implicitly assuming.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            roelofs Greg Roelofs
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment