Kafka
  1. Kafka
  2. KAFKA-367

StringEncoder/StringDecoder use platform default character set

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:

      Description

      StringEncoder and StringDecoder take the platform default character set. This is bad since the messages they produce are sent off that machine. We should
      – add a new required argument to these that adds the character set and default to UTF-8 rather than the machine setting
      – add a commandline parameter for the console-* tools to let you specify the correct encoding.

      1. KAFKA-367-5.patch
        8 kB
        Eli Reisman
      2. KAFKA-367-4.patch
        5 kB
        Eli Reisman
      3. KAFKA-367-3.patch
        3 kB
        Eli Reisman
      4. KAFKA-367-3.patch
        3 kB
        Eli Reisman
      5. KAFKA-367-2.patch
        5 kB
        Eli Reisman
      6. KAFKA-367-1.patch
        4 kB
        Eli Reisman

        Issue Links

          Activity

          Hide
          Jay Kreps added a comment -

          Eli, I was also working on the same area for KAFKA-544 so I attempted to integrate this patch into that one. I have committed that to the 0.8 branch. If you get a chance take a look at the current 0.8 branch HEAD and let me know if I got it right.

          Show
          Jay Kreps added a comment - Eli, I was also working on the same area for KAFKA-544 so I attempted to integrate this patch into that one. I have committed that to the 0.8 branch. If you get a chance take a look at the current 0.8 branch HEAD and let me know if I got it right.
          Hide
          Eli Reisman added a comment -

          This includes the Encoder/Decoder improvement as in the 367-4 patch, and the Consumer and Producer console options to set character encoding manually and place it into the relevant Properties objects for consumption inside Kafka. Passes SBT tests etc. should be good to go?

          Show
          Eli Reisman added a comment - This includes the Encoder/Decoder improvement as in the 367-4 patch, and the Consumer and Producer console options to set character encoding manually and place it into the relevant Properties objects for consumption inside Kafka. Passes SBT tests etc. should be good to go?
          Hide
          Eli Reisman added a comment -

          This seems to be working/passing all the tests. If this is looking like what you had in mind, I can add a command line option to set the encoding manually and maybe we're good to go.

          Show
          Eli Reisman added a comment - This seems to be working/passing all the tests. If this is looking like what you had in mind, I can add a command line option to set the encoding manually and maybe we're good to go.
          Hide
          Eli Reisman added a comment -

          I still need to add console opts for encoding and write these new constructors into the producer and consumer stuff, but the way I have the patch now I believe the properties calls have hardcoded defaults to "UTF-8" so acutally I do think even the no-arg constructors would be safe with this in the case where no character encoding option is chosen. Am I wrong about that?

          Anyway there's more to do here regardless, will get to it and post something ASAP

          Show
          Eli Reisman added a comment - I still need to add console opts for encoding and write these new constructors into the producer and consumer stuff, but the way I have the patch now I believe the properties calls have hardcoded defaults to "UTF-8" so acutally I do think even the no-arg constructors would be safe with this in the case where no character encoding option is chosen. Am I wrong about that? Anyway there's more to do here regardless, will get to it and post something ASAP
          Hide
          Eli Reisman added a comment -

          Thanks, sounds good. Will do.

          Show
          Eli Reisman added a comment - Thanks, sounds good. Will do.
          Hide
          Jay Kreps added a comment -

          Yeah I think the contract now becomes "your class needs to implement the interface and provide a constructor that takes Properties, so it does make sense to have this in the class rather than the trait since we won't know what to do with the properties except in the class.

          I think we do need to use the new constructor to make this bug fixed since otherwise we are still going to pick up the default char set.

          Show
          Jay Kreps added a comment - Yeah I think the contract now becomes "your class needs to implement the interface and provide a constructor that takes Properties, so it does make sense to have this in the class rather than the trait since we won't know what to do with the properties except in the class. I think we do need to use the new constructor to make this bug fixed since otherwise we are still going to pick up the default char set.
          Hide
          Eli Reisman added a comment -

          Sorry, that makes sense. Wasn't sure if that was "to be added later when someone needs it" or if that should be wired up now. Will do. Kind of felt like the constructor should have been in the trait rather than the classes too but the compiler didn't agree. Might try another swipe at getting that to happen too.

          Show
          Eli Reisman added a comment - Sorry, that makes sense. Wasn't sure if that was "to be added later when someone needs it" or if that should be wired up now. Will do. Kind of felt like the constructor should have been in the trait rather than the classes too but the compiler didn't agree. Might try another swipe at getting that to happen too.
          Hide
          Jay Kreps added a comment -

          This looks good, but I don't see where we make use of the new constructor. I think we need to pass in the properties in the producer and consumer, right?

          Show
          Jay Kreps added a comment - This looks good, but I don't see where we make use of the new constructor. I think we need to pass in the properties in the producer and consumer, right?
          Hide
          Eli Reisman added a comment -

          Is this more like it? I am still getting an error on several tests on SBT but they are ZooKeeper tests and I'm not certain if they relate directly to this patch or my local setup is not right for running the tests? They culminate on the Kafka side of the code in 2 ZK tests at Utils.getObject() line 675. Does this ring any bells?

          Thanks!

          Show
          Eli Reisman added a comment - Is this more like it? I am still getting an error on several tests on SBT but they are ZooKeeper tests and I'm not certain if they relate directly to this patch or my local setup is not right for running the tests? They culminate on the Kafka side of the code in 2 ZK tests at Utils.getObject() line 675. Does this ring any bells? Thanks!
          Hide
          Eli Reisman added a comment -

          Is this more like it? This seems right and the decoder related tests seem to be passing, but when I run the suite in SBT on my local machine I get several errors in tests regarding ZooKeeper. I am unaware whether these are due to local test config or genuine trouble. They seem to culminate at the Kafka end in Utils.getObject() at line 675.

          Show
          Eli Reisman added a comment - Is this more like it? This seems right and the decoder related tests seem to be passing, but when I run the suite in SBT on my local machine I get several errors in tests regarding ZooKeeper. I am unaware whether these are due to local test config or genuine trouble. They seem to culminate at the Kafka end in Utils.getObject() at line 675.
          Hide
          Eli Reisman added a comment -

          Love it. I'll get right on this, thanks for the advice!

          Show
          Eli Reisman added a comment - Love it. I'll get right on this, thanks for the advice!
          Hide
          Jay Kreps added a comment -

          This looks good. The only issue is that the encoder and decoder are pluggable (you just give the class name). This makes hard coding a property as a variable in the config object for a particular serializer a little awkward. I think this is actually a general need for any kind of serializer not just strings, the serializer might need access to various URLs, or to a schema string or all kinds of other things. Since we want people to be able to make their own serializers without modifying any kafka code, this is not very good.

          Currently we require a no-arg constructor for the serializer. What if, instead, we required a constructor that took a single argument, the java.util.Properties instance? This way you could write a custom serializer and pass any properties you liked to it without needing to modify any kafka code. Perhaps ideally we could look for either the Properties only constructor or the no-arg constructor preferring the properties one, allowing backwards compatibility with any encoders people have already.

          Show
          Jay Kreps added a comment - This looks good. The only issue is that the encoder and decoder are pluggable (you just give the class name). This makes hard coding a property as a variable in the config object for a particular serializer a little awkward. I think this is actually a general need for any kind of serializer not just strings, the serializer might need access to various URLs, or to a schema string or all kinds of other things. Since we want people to be able to make their own serializers without modifying any kafka code, this is not very good. Currently we require a no-arg constructor for the serializer. What if, instead, we required a constructor that took a single argument, the java.util.Properties instance? This way you could write a custom serializer and pass any properties you liked to it without needing to modify any kafka code. Perhaps ideally we could look for either the Properties only constructor or the no-arg constructor preferring the properties one, allowing backwards compatibility with any encoders people have already.
          Hide
          Eli Reisman added a comment -

          Another attempt. Also not working quite yet. I'm new to Scala and not quite sure which syntax trick allows you to call a default constructor with and without parens from client code, and to have an override with a string in the constructor without upsetting Scala or the test code. This is happening in my modifications to StringEncoder. I would ideally like to make the character encoding setting a "val" as well. But all this will be solved soon I hope...

          Feel free to take a look and advise a wayward Java refugee...

          Show
          Eli Reisman added a comment - Another attempt. Also not working quite yet. I'm new to Scala and not quite sure which syntax trick allows you to call a default constructor with and without parens from client code, and to have an override with a string in the constructor without upsetting Scala or the test code. This is happening in my modifications to StringEncoder. I would ideally like to make the character encoding setting a "val" as well. But all this will be solved soon I hope... Feel free to take a look and advise a wayward Java refugee...
          Hide
          Eli Reisman added a comment -

          Producer side is messed up, working on patch #2 here...

          Show
          Eli Reisman added a comment - Producer side is messed up, working on patch #2 here...
          Hide
          Eli Reisman added a comment -

          First attempt here. Defaulting StringEncoder/Decoder to "UTF-8" is done, should be able to get user-defined alternate charsets from CLI and use from Consumer side, not so sure about Producer side.

          Let me know if there's stuff to fix here, thanks again!

          Show
          Eli Reisman added a comment - First attempt here. Defaulting StringEncoder/Decoder to "UTF-8" is done, should be able to get user-defined alternate charsets from CLI and use from Consumer side, not so sure about Producer side. Let me know if there's stuff to fix here, thanks again!

            People

            • Assignee:
              Eli Reisman
              Reporter:
              Jay Kreps
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development