Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.7.2
    • Fix Version/s: None
    • Component/s: conf
    • Labels:
      None

      Description

      The Configuration class should become an interface, e.g.:

      public interface Configuration {
      String get(String nam);
      String set(String name, String value);

      int getInt(String name);
      void setInt(String name, int value);
      float getFloat(String name);
      void setFloat(String name, float value);
      //... other utility methods based on get(String) and set(String,String) ...
      }

      An abstract class named ConfigurationBase should be implemented as follows:

      public abstract class ConfigurationBase implements Configuration {
      abstract public String get(String nam);
      abstract public String set(String name, String value);

      public int getInt(String name)

      { ... implementation in terms of get(String) ... }
      public void setInt(String name, int value) {... implementation in terms of set(String, String) ...}
      public float getFloat(String name) { ... implementation in terms of get(String) ... }

      public void setFloat(String name, float value)

      {... implementation in terms of set(String, String) ...}

      //... other utility methods based on get(String) and set(String,String) ...
      }

      A concrete, default implementation will be provided as follows:

      public class ConfigurationImpl implements Writable extends ConfigurationBase {
      private Properties properties;

      // implement abstract methods from ConfigurationBase
      public String get(String name)

      { ... implemented in terms of props ...}

      public String set(String name, String value)

      { .. implemented in terms of props ... }

      // Writable methods
      public write(DataOutputStream out);
      public readFields(DataInputStream in);

      // permit chaining of configurations
      public Configuration getDefaults();
      public void setDefaults(Configuration defaults);
      }

      Only code which creates configurations should need to be updated, so this shouldn't be a huge change.

      1. HADOOP-24.patch
        57 kB
        Gal Nitzan
      2. HADOOP-24.patch
        80 kB
        Gal Nitzan
      3. HADOOP-24-2.patch
        80 kB
        Gal Nitzan
      4. HADOOP-24-3.patch
        80 kB
        Gal Nitzan
      5. HADOOP-24-4.patch
        81 kB
        Gal Nitzan
      6. configuration.java
        6 kB
        Gal Nitzan
      7. configuration.patch
        76 kB
        Gal Nitzan
      8. conf-with-factory.patch
        106 kB
        Gal Nitzan

        Issue Links

          Activity

          Hide
          Steve Loughran added a comment -

          I concur; a WONTFIX is a better outcome, with

          1. a new improvement 'support multiple configuration back ends' added.

          2. a separate issue 'add a specific ConfigurationException that extends RuntimeException to throw for Configuration problems'...that can be added before any back-end improvements, and could also be used to address HADOOP-2081 and any number format exceptions thrown from any HADOOP-2461 -related changes. This is simpler and would seem to be higher priority. Its easier to test, too.

          Show
          Steve Loughran added a comment - I concur; a WONTFIX is a better outcome, with 1. a new improvement 'support multiple configuration back ends' added. 2. a separate issue 'add a specific ConfigurationException that extends RuntimeException to throw for Configuration problems'...that can be added before any back-end improvements, and could also be used to address HADOOP-2081 and any number format exceptions thrown from any HADOOP-2461 -related changes. This is simpler and would seem to be higher priority. Its easier to test, too.
          Hide
          Doug Cutting added a comment -

          Steve, I agree with your analysis. This is a very old issue! Perhaps we should just close it as "won't fix" and, if someone needs to integrate Hadoop with a different configuration system, pursue the ConfSource approach?

          Show
          Doug Cutting added a comment - Steve, I agree with your analysis. This is a very old issue! Perhaps we should just close it as "won't fix" and, if someone needs to integrate Hadoop with a different configuration system, pursue the ConfSource approach?
          Hide
          Steve Loughran added a comment -

          Looking at this, I'm not sure moving to a formal Interface for Configuration is the right tactic, not if your goal is to allow for different back-ends. Here's why

          -there's a lot of code in Configuration which makes config work easier on top of the underling read/write a few types of name-value mappings, replicating this is expensive and creates an ongoing maintenance nightmare.

          -JobConf subclasses Configuration, and there are places where code checks for instanceof JobConf and/or a new JobConf gets created. If you really wanted to hand configuration off to something other than XML file datasources, you need to handle JobConf too, which is trickier.

          Which means, right now, you don't want to extend Configuration, you want to extend JobConf, then work out how to sneak in a different factory.

          What may be easier would be an inner ConfSource interface, which supported the raw get/set operations, possibly typed, possibly simple strings. Then you'd be able to create Configurations bound to different ConfSource back ends.

          Where there are problems here is in error handling. Currently its Configuration.loadResource() that faults on bad XML, by throwing a simple RuntimeException. If you allow different conf sources, you need to allow for trouble happening later on, when the LDAP or JDBC runtime decides to throw some exception. Either the ConfSource methods are allowed to throw any Exception, or there is a need for a standard ConfigurationRuntimeException to throw/wrap trouble.

          Show
          Steve Loughran added a comment - Looking at this, I'm not sure moving to a formal Interface for Configuration is the right tactic, not if your goal is to allow for different back-ends. Here's why -there's a lot of code in Configuration which makes config work easier on top of the underling read/write a few types of name-value mappings, replicating this is expensive and creates an ongoing maintenance nightmare. -JobConf subclasses Configuration, and there are places where code checks for instanceof JobConf and/or a new JobConf gets created. If you really wanted to hand configuration off to something other than XML file datasources, you need to handle JobConf too, which is trickier. Which means, right now, you don't want to extend Configuration, you want to extend JobConf, then work out how to sneak in a different factory. What may be easier would be an inner ConfSource interface, which supported the raw get/set operations, possibly typed, possibly simple strings. Then you'd be able to create Configurations bound to different ConfSource back ends. Where there are problems here is in error handling. Currently its Configuration.loadResource() that faults on bad XML, by throwing a simple RuntimeException. If you allow different conf sources, you need to allow for trouble happening later on, when the LDAP or JDBC runtime decides to throw some exception. Either the ConfSource methods are allowed to throw any Exception, or there is a need for a standard ConfigurationRuntimeException to throw/wrap trouble.
          Hide
          Gal Nitzan added a comment -

          Any comment???

          Show
          Gal Nitzan added a comment - Any comment???
          Hide
          Gal Nitzan added a comment -

          Done.... for some reason attaching thw file did not produce an email.

          Show
          Gal Nitzan added a comment - Done.... for some reason attaching thw file did not produce an email.
          Hide
          Gal Nitzan added a comment -

          I agree with you Doug. I also took the liberty to make the ConfigurationFactory an interface.

          signed sealed and delivered (without missing classes I hope )

          Gal.

          Show
          Gal Nitzan added a comment - I agree with you Doug. I also took the liberty to make the ConfigurationFactory an interface. signed sealed and delivered (without missing classes I hope ) Gal.
          Hide
          Doug Cutting added a comment -

          When I apply this to trunk things don't compile. I think you forgot to include ConfigurationImpl.java.

          I also wonder if, since we're editing all the places where a configuration is constructed, we might change things to use a factory. So, instead of 'new ConfigurationImpl()' we'd have something like 'ConfigurationFactory.get()'. That way, we could in the future change the default implementation without changing all of these places again. E.g., if you're nesting Hadoop in JMX then you might specify a JMX configuration factory as the default. Does that make sense?

          Show
          Doug Cutting added a comment - When I apply this to trunk things don't compile. I think you forgot to include ConfigurationImpl.java. I also wonder if, since we're editing all the places where a configuration is constructed, we might change things to use a factory. So, instead of 'new ConfigurationImpl()' we'd have something like 'ConfigurationFactory.get()'. That way, we could in the future change the default implementation without changing all of these places again. E.g., if you're nesting Hadoop in JMX then you might specify a JMX configuration factory as the default. Does that make sense?
          Hide
          Gal Nitzan added a comment -

          please ignore previous attachment. this is the correct file.

          Show
          Gal Nitzan added a comment - please ignore previous attachment. this is the correct file.
          Hide
          Gal Nitzan added a comment -

          a patch that changes Configuration class to interface

          Show
          Gal Nitzan added a comment - a patch that changes Configuration class to interface
          Hide
          Gal Nitzan added a comment -

          Hi, at last I found some time

          Attached the patch that changes Configuration into an Interface.

          It passes all tests.

          Enjoy.

          Show
          Gal Nitzan added a comment - Hi, at last I found some time Attached the patch that changes Configuration into an Interface. It passes all tests. Enjoy.
          Hide
          Gal Nitzan added a comment -

          Great, I shall rewrite and submit a new patch for review.

          Show
          Gal Nitzan added a comment - Great, I shall rewrite and submit a new patch for review.
          Hide
          Doug Cutting added a comment -

          Right, we should not add dependencies to Nutch on as-yet-unreleased features in Hadoop. But we want Nutch users to be able to use the current Hadoop trunk. So once we've made the next Hadoop release, and upgraded Nutch's trunk to use that release, then we can introduce changes to Nutch.

          Show
          Doug Cutting added a comment - Right, we should not add dependencies to Nutch on as-yet-unreleased features in Hadoop. But we want Nutch users to be able to use the current Hadoop trunk. So once we've made the next Hadoop release, and upgraded Nutch's trunk to use that release, then we can introduce changes to Nutch.
          Hide
          Gal Nitzan added a comment -

          I totaly agree with you Doug on API back compatibilty.

          +1 for org.apache.hadoop.config

          So if Configuration subclasses ConfigurationImpl you would not have to introduce changes in Nutch for now? and you can postpone it to a suitable time?

          Show
          Gal Nitzan added a comment - I totaly agree with you Doug on API back compatibilty. +1 for org.apache.hadoop.config So if Configuration subclasses ConfigurationImpl you would not have to introduce changes in Nutch for now? and you can postpone it to a suitable time?
          Hide
          Doug Cutting added a comment -

          API back-compatibility is important. We'd like folks to be able to upgrade to new releases without having to change their code. The convention we've had is that, code which compiles against release 0.X without deprecation warnings will still compile against release 0.X+1, but may require changes to compile against release 0.X+2. Is that reasonable?

          If so, then 'new Configuration()' must still create a usable configuration and not a compilation error in release 0.3. It may be deprecated, but it should still work. Yes, this is a pain, since Configuration would be a great name for the interface, but there's code out there (more than just Nutch) that we should try not to break. Nutch is a good, public test case for this sort of back-compatibility, that's the reason I mention it.

          Another approach might be to use a new package. We could put the new classes in org.apache.hadoop.config, and then have org.apache.hadoop.conf.Configuration be a deprecated subclass of org.apache.hadoop.config.ConfigurationImpl. Then, after 0.3 is released, we could remove the entire org.apache.hadoop.conf package.

          Show
          Doug Cutting added a comment - API back-compatibility is important. We'd like folks to be able to upgrade to new releases without having to change their code. The convention we've had is that, code which compiles against release 0.X without deprecation warnings will still compile against release 0.X+1, but may require changes to compile against release 0.X+2. Is that reasonable? If so, then 'new Configuration()' must still create a usable configuration and not a compilation error in release 0.3. It may be deprecated, but it should still work. Yes, this is a pain, since Configuration would be a great name for the interface, but there's code out there (more than just Nutch) that we should try not to break. Nutch is a good, public test case for this sort of back-compatibility, that's the reason I mention it. Another approach might be to use a new package. We could put the new classes in org.apache.hadoop.config, and then have org.apache.hadoop.conf.Configuration be a deprecated subclass of org.apache.hadoop.config.ConfigurationImpl. Then, after 0.3 is released, we could remove the entire org.apache.hadoop.conf package.
          Hide
          Gal Nitzan added a comment -

          Hi,

          In regards to the code compatibility - For Hadoop all new code is already included (tests also) is in this patch.

          For Nutch I already have the patch ready and it is not that big a change since most code uses Configuration (so the actual change is only in construction) so ConfigurationImpl has a copy constructor that accepts a Configuration. I actualy tested it also. I can submit it here as well if we decide to use this patch.

          I'm not sure it would be a good idea to change the current design/implementation just for the sake of compatibility. I think the design(Doug's) and the implementation are clean and shall be more understandable to implementers/extenders.

          Show
          Gal Nitzan added a comment - Hi, In regards to the code compatibility - For Hadoop all new code is already included (tests also) is in this patch. For Nutch I already have the patch ready and it is not that big a change since most code uses Configuration (so the actual change is only in construction) so ConfigurationImpl has a copy constructor that accepts a Configuration. I actualy tested it also. I can submit it here as well if we decide to use this patch. I'm not sure it would be a good idea to change the current design/implementation just for the sake of compatibility. I think the design(Doug's) and the implementation are clean and shall be more understandable to implementers/extenders.
          Hide
          Gal Nitzan added a comment -

          forgot to set abstract for getConfResource... in ConfigurationBase

          Show
          Gal Nitzan added a comment - forgot to set abstract for getConfResource... in ConfigurationBase
          Hide
          Konstantin Shvachko added a comment -

          I think that all declarations and parameters should say Configuration that is the interface.
          And the constructors should be the only places where ConfigurationImpl
          appears explicitly. E.g. all occurrences
          private static ConfigurationImpl conf = new ConfigurationImpl();
          should be replaced by
          private static Configuration conf = new ConfigurationImpl();
          I haven't seen a lot of nutch code, but if it uses only the API, then
          that should work. If it does construct config instances then there should be
          a static newInstance() somewhere I guess.

          Show
          Konstantin Shvachko added a comment - I think that all declarations and parameters should say Configuration that is the interface. And the constructors should be the only places where ConfigurationImpl appears explicitly. E.g. all occurrences private static ConfigurationImpl conf = new ConfigurationImpl(); should be replaced by private static Configuration conf = new ConfigurationImpl(); I haven't seen a lot of nutch code, but if it uses only the API, then that should work. If it does construct config instances then there should be a static newInstance() somewhere I guess.
          Hide
          Doug Cutting added a comment -

          I finally had a chance to look closely at this. One problem I now see is that it's incompatible: code that previously called 'new Configuration()' has to be updated to instead call 'new ConfigurationImpl()'. Nutch is a good test-case for back-compatibility, since it uses so much of Hadoop's APIs. Ideally we can figure out a way to do this so that, e.g., the new Hadoop jar can be dropped into existing Nutch and work correctly.

          I think this means that the interface cannot be named 'Configuration'. Instead we should probably call it ConfigurationInterface, and change ConfigurationImpl to be Configuration. Does that sound like a good solution?

          Thanks for the patch, and thanks for your patience!

          Show
          Doug Cutting added a comment - I finally had a chance to look closely at this. One problem I now see is that it's incompatible: code that previously called 'new Configuration()' has to be updated to instead call 'new ConfigurationImpl()'. Nutch is a good test-case for back-compatibility, since it uses so much of Hadoop's APIs. Ideally we can figure out a way to do this so that, e.g., the new Hadoop jar can be dropped into existing Nutch and work correctly. I think this means that the interface cannot be named 'Configuration'. Instead we should probably call it ConfigurationInterface, and change ConfigurationImpl to be Configuration. Does that sound like a good solution? Thanks for the patch, and thanks for your patience!
          Hide
          Gal Nitzan added a comment -

          Last minute changes

          Had to move getObject setObject to ConfigurationImpl

          Show
          Gal Nitzan added a comment - Last minute changes Had to move getObject setObject to ConfigurationImpl
          Hide
          Gal Nitzan added a comment -

          Added some missing interfaces

          Show
          Gal Nitzan added a comment - Added some missing interfaces
          Hide
          Gal Nitzan added a comment -

          Since this is my first interaction with Hadoop code "on a first name basis" I didn't want to rock the boat too much.

          My first goal was to implement the interfaces and the implementation with as little as possible major changes all arround the code. As it is now all test are still working and nothing broken.

          So if the patch would "pass" the reviews it would be easier to make additions when all blocks are in place.

          getPropertyNames is good idea to add as an interface to configuration.

          Thanks.

          Show
          Gal Nitzan added a comment - Since this is my first interaction with Hadoop code "on a first name basis" I didn't want to rock the boat too much. My first goal was to implement the interfaces and the implementation with as little as possible major changes all arround the code. As it is now all test are still working and nothing broken. So if the patch would "pass" the reviews it would be easier to make additions when all blocks are in place. getPropertyNames is good idea to add as an interface to configuration. Thanks.
          Hide
          Andrzej Bialecki added a comment -

          In Nutch in many places we create objects with a "pseudo-singleton" semantics (i.e. one per task), that are costly to create. These methods were added to provide a convenient caching mechanism to carry around these objects within the same task. They don't have to be serialized/deserialized.

          Show
          Andrzej Bialecki added a comment - In Nutch in many places we create objects with a "pseudo-singleton" semantics (i.e. one per task), that are costly to create. These methods were added to provide a convenient caching mechanism to carry around these objects within the same task. They don't have to be serialized/deserialized.
          Hide
          David Bowen added a comment -

          The code contains getObject and setObject methods. Is this a good idea - considering that it means that the ConfigurationImpl cannot be written out and read back without losing those values? I think it would be better to remove these and maintain the ability to serialize and deserialize. (Is there a reason not to have Configuration extend Writable?)

          What is a use case for ConfigurationBase.getProperties() which returns an object of unknown type? It might be better to have something like String[] getPropertyNames(String regex). (And this should be in Configuration also.) This allows for things like treating the property name space as a hierarchy, so that e.g. you can find all the properties beginning with some string.

          • David
          Show
          David Bowen added a comment - The code contains getObject and setObject methods. Is this a good idea - considering that it means that the ConfigurationImpl cannot be written out and read back without losing those values? I think it would be better to remove these and maintain the ability to serialize and deserialize. (Is there a reason not to have Configuration extend Writable?) What is a use case for ConfigurationBase.getProperties() which returns an object of unknown type? It might be better to have something like String[] getPropertyNames(String regex). (And this should be in Configuration also.) This allows for things like treating the property name space as a hierarchy, so that e.g. you can find all the properties beginning with some string. David
          Hide
          Gal Nitzan added a comment -

          Fix previous patch. Forgot to add the actual files

          Show
          Gal Nitzan added a comment - Fix previous patch. Forgot to add the actual files
          Hide
          Gal Nitzan added a comment -

          Hi,

          I have implemented the configuration interface.

          Remarks would be appreciated.

          Gal.

          Show
          Gal Nitzan added a comment - Hi, I have implemented the configuration interface. Remarks would be appreciated. Gal.

            People

            • Assignee:
              Doug Cutting
              Reporter:
              Doug Cutting
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development