Kafka
  1. Kafka
  2. KAFKA-943

Move all configuration key string to constants

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.8.0
    • Fix Version/s: None
    • Component/s: config
    • Labels:
      None

      Description

      The current code base has configuration key strings duplicated all over the place. They show up in the actual *Config classes, a lot of tests, command line utilities and other examples. This makes changes hard and error prone. DRY...

      The attached patch moves these configuration keys to constants and replaces their usage with a reference to the constant. It also cleans up a few old properties and a few misconfigured tests. I've admittedly not written a whole lot of Scala, so there may be some improvements that can be made, in particular I am not sure I chose the best strategy for keys needed by the SyncProducerConfigShared trait (or traits in general).

        Activity

        Hide
        Jay Kreps added a comment -

        Hey Sam, thanks for the patch it is great to get clean-up patches.

        I have sort of a philosophical disagreement, though. I am likely the cause of the direct use of string names. This was done intentionally, here is the rationale:

        Programmers have kind of a knee-jerk belief that using strings directly is bad and having variables that point at the string is good. However I don't agree. Here is what I think. A global variable is a layer of indirection. By having a layer of indirection you get the following trade-off: it is harder to figure out which configuration is pointed to by the variable BUT it is easier to change the configuration string later and have all uses updated. That is normally a good trade-off to make-we don't want string constants floating around all over the place. However configuration names are different-these are a sort of public api to everyone who uses our system. There are hundreds or thousands of people with these strings in their configuration files or code or whatever. Adding a layer of indirection doesn't help these people at all. So the fact is that configurations are meant to be a public api of sorts and only change as part of a major refactoring. So in other words we don't want to make it easy. At the same time if we use a variable name instead of the configuration name people think less about the name and we end up with bad names, which is really irritating.

        Our approach to this was to get all the committers to call out and ask for review any configuration names in their JIRA and try to get extra review on these and then not change them.

        Let me know if you buy that argument. I don't think it is a huge deal either way, but I am a little on the side of using the name directly.

        Show
        Jay Kreps added a comment - Hey Sam, thanks for the patch it is great to get clean-up patches. I have sort of a philosophical disagreement, though. I am likely the cause of the direct use of string names. This was done intentionally, here is the rationale: Programmers have kind of a knee-jerk belief that using strings directly is bad and having variables that point at the string is good. However I don't agree. Here is what I think. A global variable is a layer of indirection. By having a layer of indirection you get the following trade-off: it is harder to figure out which configuration is pointed to by the variable BUT it is easier to change the configuration string later and have all uses updated. That is normally a good trade-off to make- we don't want string constants floating around all over the place. However configuration names are different -these are a sort of public api to everyone who uses our system. There are hundreds or thousands of people with these strings in their configuration files or code or whatever. Adding a layer of indirection doesn't help these people at all. So the fact is that configurations are meant to be a public api of sorts and only change as part of a major refactoring. So in other words we don't want to make it easy. At the same time if we use a variable name instead of the configuration name people think less about the name and we end up with bad names, which is really irritating. Our approach to this was to get all the committers to call out and ask for review any configuration names in their JIRA and try to get extra review on these and then not change them. Let me know if you buy that argument. I don't think it is a huge deal either way, but I am a little on the side of using the name directly.
        Hide
        Sam Meder added a comment -

        I would actually argue that it is easier on external "code" users of Kafka to have the string constants be named/indirected. And at least in our case we use these constants much more in code than in config files (all consumer/producer props are in code, server props are mostly in config files except for test related stuff) and having named strings allows typed languages to correctly detect changes (renames and deletes) and allows for much better IDE support. I think that outweighs the benefit of having a higher barrier of entry to making changes on the Kafka side, especially since the pain of changing will be felt "more" by everyone, but we're down to opinions now.

        Anyway, an alternative to achieving some of the same goals would be to change validation in the configuration classes so it only accepts properties it knows about (unrecognized properties would cause errors). That might obviously cause people who put all configuration properties for their application (including the Kafka ones) a bit of pain, but would allow folks to detect changed and removed properties. Note the even the Kafka codebase itself was still referencing properties that no longer exist (fixed in the patch) and this is a real problem in our experience in tracking 0.8.

        Show
        Sam Meder added a comment - I would actually argue that it is easier on external "code" users of Kafka to have the string constants be named/indirected. And at least in our case we use these constants much more in code than in config files (all consumer/producer props are in code, server props are mostly in config files except for test related stuff) and having named strings allows typed languages to correctly detect changes (renames and deletes) and allows for much better IDE support. I think that outweighs the benefit of having a higher barrier of entry to making changes on the Kafka side, especially since the pain of changing will be felt "more" by everyone, but we're down to opinions now. Anyway, an alternative to achieving some of the same goals would be to change validation in the configuration classes so it only accepts properties it knows about (unrecognized properties would cause errors). That might obviously cause people who put all configuration properties for their application (including the Kafka ones) a bit of pain, but would allow folks to detect changed and removed properties. Note the even the Kafka codebase itself was still referencing properties that no longer exist (fixed in the patch) and this is a real problem in our experience in tracking 0.8.
        Hide
        Jay Kreps added a comment -

        Yeah, I think the core disagreement is that we are optimizing for slightly different uses. I think the "right way" to handle config is through a config management system external to the application-that is why we use properties to begin with instead of just a config pojo-so I think that is the use case I am optimizing for. In that case the property name is the contract. This has obvious pros and cons. The pro is having a hard separation between config and code and the ability to manage configuration at user-defined groupings (the instance, server, cluster, datacenter, and fabric levels, say). The con is that it is not compile time checked.

        I agree that this is somewhat subjective.

        The rationale for warning versus error for unused configs is the ability to make configuration detached from the application. I.e. let's say you want to roll out kafka client++ and it has a new configuration. Do you roll out the new configuration first or the new code? If you do the config first then it is important that we be able to ignore the property so that restarting your app works, if it is the code then you will end up with the default which may or may not work.

        The non-existant properties are obviously embarrassing and we should fix them irrespective of anything else.

        Show
        Jay Kreps added a comment - Yeah, I think the core disagreement is that we are optimizing for slightly different uses. I think the "right way" to handle config is through a config management system external to the application- that is why we use properties to begin with instead of just a config pojo -so I think that is the use case I am optimizing for. In that case the property name is the contract. This has obvious pros and cons. The pro is having a hard separation between config and code and the ability to manage configuration at user-defined groupings (the instance, server, cluster, datacenter, and fabric levels, say). The con is that it is not compile time checked. I agree that this is somewhat subjective. The rationale for warning versus error for unused configs is the ability to make configuration detached from the application. I.e. let's say you want to roll out kafka client++ and it has a new configuration. Do you roll out the new configuration first or the new code? If you do the config first then it is important that we be able to ignore the property so that restarting your app works, if it is the code then you will end up with the default which may or may not work. The non-existant properties are obviously embarrassing and we should fix them irrespective of anything else.
        Hide
        Sam Meder added a comment -

        So we also have a config management system external to the application, but it has a key space managed by ourselves and we do not expose Kafka properties directly in that configuration. We also end up not even exposing some of the Kafka properties, some of them can just be set to reasonable defaults for our use case and not exposed to our configuration system. As always it is hard to say how common our use is, but it doesn't strike me as that unusual.

        Fair point regarding coupling code & config (we end up deploying code & configuration in lockstep - they're versioned together). I guess one solution would be versioning configuration, but that doesn't seem worth the complexity here.

        Show
        Sam Meder added a comment - So we also have a config management system external to the application, but it has a key space managed by ourselves and we do not expose Kafka properties directly in that configuration. We also end up not even exposing some of the Kafka properties, some of them can just be set to reasonable defaults for our use case and not exposed to our configuration system. As always it is hard to say how common our use is, but it doesn't strike me as that unusual. Fair point regarding coupling code & config (we end up deploying code & configuration in lockstep - they're versioned together). I guess one solution would be versioning configuration, but that doesn't seem worth the complexity here.
        Hide
        Sam Meder added a comment -

        How about an option that turns on strict validation?

        Show
        Sam Meder added a comment - How about an option that turns on strict validation?
        Hide
        Jay Kreps added a comment -

        That would work for me.

        Show
        Jay Kreps added a comment - That would work for me.
        Hide
        Jay Kreps added a comment -

        I cleaned up the bad config strings I saw in our example server configs as part of KAFKA-718.

        Show
        Jay Kreps added a comment - I cleaned up the bad config strings I saw in our example server configs as part of KAFKA-718 .
        Hide
        Sam Meder added a comment -

        Resolving this issue since the patch wasn't accepted. I'll open a new issue for the strict validation patch

        Show
        Sam Meder added a comment - Resolving this issue since the patch wasn't accepted. I'll open a new issue for the strict validation patch

          People

          • Assignee:
            Unassigned
            Reporter:
            Sam Meder
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development