Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: conf
    • Labels:
      None

      Description

      Improve how Hadoop gets configured

      The current approach of a two-level XML configuration file works; it offers

      • default values in (easily overridden) configuration files, rather than just Java source
      • A way to override the default values
      • conversion from string to float, double.
      • with the $ {property}

        evaluation, there is some ability to reference system values for some limited adaptation.

      • errors show up at parse time (except for value parse problems)
      • A serialization format to exchange configuration with other nodes
      • the possibility of updating a local (in-memory) configuration

      But it has limits
      [1] Requires a change to the XML files to be pushed out to every node
      [2] Differences between configurations can cause obscure bugs
      [3] No support for complex hierarchical configurations
      [4] No easy way to cross-reference data other than copy and paste.
      [5] No way for a deployed instance to update configuration data for other instances to query
      [6] Value type checking/dereferencing failure is not signalled by a custom error; there is no explicit exception on any of the get/set operations.
      [7] No consistency with names.
      [8] Not easily managed by different configuration architectures/tools

      This issue is to group/track the problems, then discuss solutions.

        Issue Links

          Activity

          Hide
          steve_l added a comment -

          HADOOP-1307, which adds a new requirement, being able to tag a configuration parameter with a beginner/advanced/expert label.

          Consider also

          • the current format has a description, which is good -though sometimes it is out of sync with the parameter
          • if parameters could be tagged with a wiki link, we could have off-site specifics on every option. Or at least the troublesome ones.
          Show
          steve_l added a comment - HADOOP-1307 , which adds a new requirement, being able to tag a configuration parameter with a beginner/advanced/expert label. Consider also the current format has a description, which is good -though sometimes it is out of sync with the parameter if parameters could be tagged with a wiki link, we could have off-site specifics on every option. Or at least the troublesome ones.
          Hide
          steve_l added a comment -

          Hadoop 24 wanted Configuration to be an interface. That is impossible, as too many parts of the system create new Configuratin or JobConf instances. We should be able to provide a different ConfigurationFactory, which provides a hiearchical interface to system configuration services; Configuration would bind to one of these.

          Show
          steve_l added a comment - Hadoop 24 wanted Configuration to be an interface. That is impossible, as too many parts of the system create new Configuratin or JobConf instances. We should be able to provide a different ConfigurationFactory, which provides a hiearchical interface to system configuration services; Configuration would bind to one of these.
          Hide
          steve_l added a comment -

          HADOOP-2385 asks for a way to validate configuration parameters. This can be done in two ways

          1. the configuration language can include validation rules, some form of schema (and not XSD schema)

          2. separate validator components can validate the configuration at runtime, and fail with useful messages.

          Both approaches have value. (1) works if the rules are simple and can be checked early on. (2) can validate the parameters against the live system, and so detect anomalies.

          Show
          steve_l added a comment - HADOOP-2385 asks for a way to validate configuration parameters. This can be done in two ways 1. the configuration language can include validation rules, some form of schema (and not XSD schema) 2. separate validator components can validate the configuration at runtime, and fail with useful messages. Both approaches have value. (1) works if the rules are simple and can be checked early on. (2) can validate the parameters against the live system, and so detect anomalies.
          Hide
          steve_l added a comment -

          HADOOP-2866 implies another need:

          Have a way to mark any configuration name as deprecated. When declared, a message would be printed out warning that it is deprecated. If an attribute could also include a description text, it would be easier to migrate, e.g.

          <deprecated>no longer read</deprecated>
          or
          <deprecated>use file.io.timeout instead</deprecated>.

          I could also imagine flipping a switch to say "no deprecations allowed", at which point a deployment with any deprecated values would fail.

          Show
          steve_l added a comment - HADOOP-2866 implies another need: Have a way to mark any configuration name as deprecated. When declared, a message would be printed out warning that it is deprecated. If an attribute could also include a description text, it would be easier to migrate, e.g. <deprecated>no longer read</deprecated> or <deprecated>use file.io.timeout instead</deprecated>. I could also imagine flipping a switch to say "no deprecations allowed", at which point a deployment with any deprecated values would fail.
          Hide
          steve_l added a comment -

          Another useful requirement is:

          • Have a way of passing a list of values as a parameter. This could be more formal than comma separated
            /dir1,/dir2 as that introduces a formal parsing problem (what is the separator, what is the whitespace policy), and makes x-refs tricker, as you cannot refer to parts of the list.

          the alternative in XML would be
          <value><item>/dir1</item><item>/dir2</item></value>; this is horribly verbose, but that is XML for you.

          Big problem here: backwards compatibility.

          Show
          steve_l added a comment - Another useful requirement is: Have a way of passing a list of values as a parameter. This could be more formal than comma separated /dir1,/dir2 as that introduces a formal parsing problem (what is the separator, what is the whitespace policy), and makes x-refs tricker, as you cannot refer to parts of the list. the alternative in XML would be <value><item>/dir1</item><item>/dir2</item></value>; this is horribly verbose, but that is XML for you. Big problem here: backwards compatibility.
          Hide
          steve_l added a comment -

          One think to look out for here is the places in the code, such as hdfs.protocol.FSConstants, that do things like

          public static final int BUFFER_SIZE = new Configuration().getInt("io.file.buffer.size", 4096)
          

          That is, set a constant value from a default configuration. Currently, the only way to change such a config is to place a new hadoop-site.xml on the CP before the class is initialized.

          To have some other way of managing configurations, we'd really need a factory that could be driven by JVM properties,

          public static final int BUFFER_SIZE = ConfigurationFactory.newInstance().getInt("io.file.buffer.size", 4096)

          Show
          steve_l added a comment - One think to look out for here is the places in the code, such as hdfs.protocol.FSConstants, that do things like public static final int BUFFER_SIZE = new Configuration().getInt( "io.file.buffer.size" , 4096) That is, set a constant value from a default configuration. Currently, the only way to change such a config is to place a new hadoop-site.xml on the CP before the class is initialized. To have some other way of managing configurations, we'd really need a factory that could be driven by JVM properties, public static final int BUFFER_SIZE = ConfigurationFactory.newInstance().getInt("io.file.buffer.size", 4096)

            People

            • Assignee:
              Unassigned
              Reporter:
              Steve Loughran
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:

                Development