Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-463

variable expansion in Configuration

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.6.0
    • conf
    • None

    Description

      Add variable expansion to Configuration class.
      =================

      This is necessary for shared, client-side configurations:

      A Job submitter (an HDFS client) requires:
      <name>dfs.data.dir</name><value>/tmp/${user.name}/dfs</value>

      A local-mode mapreduce requires:
      <name>mapred.temp.dir</name><value>/tmp/${user.name}/mapred/tmp</value>

      Why this is necessary :
      =================

      Currently we use shared directories like:
      <name>dfs.data.dir</name><value>/tmp/dfs</value>
      This superficially seems to work.
      After all, different JVM clients create their own private subdirectory map_xxxx., so they will not conflict.

      What really happens:

      1. /tmp/ is world-writable, as it's supposed to.
      2. Hadoop will create missing subdirectories.
      This is Java so that for ex. /tmp/system is created as writable only by the JVM process user
      3. This is a shared client machine so next user's JVM will find /tmp/system owned by somebody else. Creating a directory within /tmp/system fails

      Implementation of var expansion
      =============
      in class Configuration,
      The Properties really store things like put("banner", "hello ${user.name}");
      In public String get(String name): postprocess the returned value:
      Use a regexp to find the pattern ${xxxx}
      Lookup xxxx as a system property
      If found, replace ${xxxx} by the system property value.
      Else leave as-is. An unexpanded ${xxxx} is a hint that the variable name is invalid.

      Other workarounds
      ===============
      The other proposed workarounds are not as elegant as variable expansion.

      Workaround 1:
      have an installation script which does:
      mkdir /tmp/dfs
      chmod uga+rw /tmp/dfs
      repeat for ALL configured subdirectories at ANY nesting level
      keep the script in sync with changes to hadoop XML configuration files.
      Support the script on non-Unix platform
      Make sure the installtion script runs before Hadoop runs for the first time.
      If users change the permissions/delete any of the shared directories, it breaks again.

      Workaround 2:
      do the chmod operations from within the Hadoop code.
      In pure java 1.4, 1.5 this is not possible.
      It requires the Hadoop client process to have chmod privilege (rather than just mkdir privilege)
      It requires to special-case directory creation code.

      Attachments

        1. confvar.patch
          8 kB
          Michel Tourn

        Issue Links

          Activity

            People

              Unassigned Unassigned
              michel_tourn Michel Tourn
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: