Thrift
  1. Thrift
  2. THRIFT-1023

Thrift encoding (UTF-8) issue with Ruby 1.9.2

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5
    • Fix Version/s: 0.9
    • Component/s: Ruby - Library
    • Labels:
      None
    • Environment:

      OSX, Ruby 1.9.2, Thrift Gem version 0.5.0

      Description

      I came up with an encoding issue coming from the Thrift library, and especially the BufferedTransport class.
      I've decided to write down few tests to give you a concrete example :

      1. encoding: utf-8
        require 'spec_helper'

      describe "encoding" do

      before do
      transport = Thrift::BufferedTransport.new(Thrift::Socket.new(MR_CONFIG['host'], 9090))
      protocol = Thrift::BinaryProtocol.new(transport)
      @client = Apache::Hadoop::Hbase::Thrift::Hbase::Client.new(protocol)

      transport.open()

      @table_name = "encoding_test"
      @column_family = "info:"
      end

      it "should create a new table" do
      column = Apache::Hadoop::Hbase::Thrift::ColumnDescriptor.new

      {|c| c.name= @column_family}

      @client.createTable(@table_name, [column]).should be_nil
      end

      it "should save standard caracteres" do
      m = Apache::Hadoop::Hbase::Thrift::Mutation.new
      m.column = "info:first_name"
      m.value = "Vincent"

      m.value.encoding.should == Encoding::UTF_8
      @client.mutateRow(@table_name, "ID1", [m]).should be_nil
      end

      it "should save UTF8 caracteres" do
      m = Apache::Hadoop::Hbase::Thrift::Mutation.new
      m.column = "info:first_name"
      m.value = "Thorbjørn"

      m.value.encoding.should == Encoding::UTF_8
      @client.mutateRow(@table_name, "ID1", [m]).should be_nil
      end

      it "should destroy the table" do
      @client.disableTable(@table_name).should be_nil
      @client.deleteTable(@table_name).should be_nil
      end
      end

      It fails when it tries to save the UTF8 string including the caractere 'ø'.

      Here is the output :

      1) encoding should save UTF8 caracteres
      Failure/Error: @client.mutateRow(@table_name, "ID1", [m]).should be_nil
      incompatible character encodings: ASCII-8BIT and UTF-8
      #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/transport/buffered_transport.rb:59:in
      `write'
      #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/protocol/binary_protocol.rb:107:in
      `write_string'
      #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in
      `write'
      #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in
      `send_message'

      1. ./lib/thrift/hbase.rb:289:in `send_mutateRow'
      2. ./lib/thrift/hbase.rb:284:in `mutateRow'
      3. ./spec/thrift/cases/encoding_spec.rb:37:in `block (2 levels) in <top
        (required)>'

      Let me know if you need any other details, thank you !

        Issue Links

          Activity

          Hide
          Jake Farrell added a comment -

          Committed v6 patch

          Show
          Jake Farrell added a comment - Committed v6 patch
          Hide
          Hudson added a comment -

          Integrated in Thrift #557 (See https://builds.apache.org/job/Thrift/557/)
          Thrift-1023:Thrift encoding (UTF-8) issue with Ruby 1.9.2
          Client: rb
          Patch: Nathan Beyer

          Fixes encoding issue for UTF-8 strings in ruby client. (Revision 1395832)

          Result = ABORTED

          Show
          Hudson added a comment - Integrated in Thrift #557 (See https://builds.apache.org/job/Thrift/557/ ) Thrift-1023:Thrift encoding (UTF-8) issue with Ruby 1.9.2 Client: rb Patch: Nathan Beyer Fixes encoding issue for UTF-8 strings in ruby client. (Revision 1395832) Result = ABORTED
          Hide
          Nathan Beyer added a comment -

          Attachment: THRIFT-1023-refactor-transport-protocol-for-ruby19-v6.patch

          This patch includes all changes to support UTF-8 encoding of Thrift strings in the binary protocols when running on Ruby 1.9+, including test cases.

          This patch also includes a few tweaks to the JSON protocol to fully support reading of unicode escapes sequences in the BMP (U+0000 to U+FFFF), as well as some test cases. I would appreciate any review of this code, as I wasn't clear why the original code asserted that all unicode escape sequences must be between U+0000 and U+00FF. I did not update the JSON protocol to fully handle unicode character above the BMP, as JSON uses a surrogate pair, double escape sequence to represent those code points and this would require a deeper refactoring that I don't have time for at the moment. Check out RFC 4627 Section 2.5 for details.

          Show
          Nathan Beyer added a comment - Attachment: THRIFT-1023-refactor-transport-protocol-for-ruby19-v6.patch This patch includes all changes to support UTF-8 encoding of Thrift strings in the binary protocols when running on Ruby 1.9+, including test cases. This patch also includes a few tweaks to the JSON protocol to fully support reading of unicode escapes sequences in the BMP (U+0000 to U+FFFF), as well as some test cases. I would appreciate any review of this code, as I wasn't clear why the original code asserted that all unicode escape sequences must be between U+0000 and U+00FF. I did not update the JSON protocol to fully handle unicode character above the BMP, as JSON uses a surrogate pair, double escape sequence to represent those code points and this would require a deeper refactoring that I don't have time for at the moment. Check out RFC 4627 Section 2.5 for details .
          Hide
          Nathan Beyer added a comment - - edited

          Attachement: THRIFT-1023-refactor-transport-protocol-for-ruby19-v5.patch

          Another adjustment to the patch. This changes the behavior of the Thrift::Bytes.convert functions to always create duplicates. I found this avoids the surprise that mutation of Strings from external classes passed into Thrift classes caused. I also added some documentation to Thrift::Bytes.

          I'm working on getting tests for this patch right now, but having a few issues with Ruby 1.9.3 native code build on OS X, per the later comments on THRIFT-1673.

          Show
          Nathan Beyer added a comment - - edited Attachement: THRIFT-1023-refactor-transport-protocol-for-ruby19-v5.patch Another adjustment to the patch. This changes the behavior of the Thrift::Bytes.convert functions to always create duplicates. I found this avoids the surprise that mutation of Strings from external classes passed into Thrift classes caused. I also added some documentation to Thrift::Bytes. I'm working on getting tests for this patch right now, but having a few issues with Ruby 1.9.3 native code build on OS X, per the later comments on THRIFT-1673 .
          Hide
          Jake Farrell added a comment -

          Thrift-1644 is committed, Nathan can you please provide the test cases for this patch

          Show
          Jake Farrell added a comment - Thrift-1644 is committed, Nathan can you please provide the test cases for this patch
          Hide
          Diwaker Gupta added a comment -

          Nathan Beyer can you attach the patch with tests included – those of us who have newer rspec locally can at least try it. FWIW, 'make check' works fine for me using v4 on OS X Mountain Lion with Ruby 1.9.3

          Show
          Diwaker Gupta added a comment - Nathan Beyer can you attach the patch with tests included – those of us who have newer rspec locally can at least try it. FWIW, 'make check' works fine for me using v4 on OS X Mountain Lion with Ruby 1.9.3
          Hide
          Nathan Beyer added a comment - - edited

          Attachement: THRIFT-1023-refactor-transport-protocol-for-ruby19-v4.patch

          I've made an additional change to my patch. This adds some checks for frozen objects. I recently learned that Ruby will copy and freeze Strings when used as keys in Hashes. This additional tweak to the patch will check for frozen Strings and 'dup' them if they are frozen.

          This obviously demonstrates the need for more unit testing, but I'm waiting for THRIFT-1644 to get resolved before building out any more unit tests.

          Show
          Nathan Beyer added a comment - - edited Attachement: THRIFT-1023-refactor-transport-protocol-for-ruby19-v4.patch I've made an additional change to my patch. This adds some checks for frozen objects. I recently learned that Ruby will copy and freeze Strings when used as keys in Hashes. This additional tweak to the patch will check for frozen Strings and 'dup' them if they are frozen. This obviously demonstrates the need for more unit testing, but I'm waiting for THRIFT-1644 to get resolved before building out any more unit tests.
          Hide
          Jake Farrell added a comment -

          1629 has a patch, it was submitted via pull request on github. We currently dont accept patches that way and I've communicated that to the contributor, just waiting on him to submit the patch to the jira ticket. I would like to avoid adding multiple dependencies, like thin and rack, if possible and keep the clients as minimal as possible.

          Show
          Jake Farrell added a comment - 1629 has a patch, it was submitted via pull request on github. We currently dont accept patches that way and I've communicated that to the contributor, just waiting on him to submit the patch to the jira ticket. I would like to avoid adding multiple dependencies, like thin and rack, if possible and keep the clients as minimal as possible.
          Hide
          Nathan Beyer added a comment -

          I'm fine with dropping Mongrel for another server. I didn't see an attachment on THRIFT-1629; does that need some assistance?

          Show
          Nathan Beyer added a comment - I'm fine with dropping Mongrel for another server. I didn't see an attachment on THRIFT-1629 ; does that need some assistance?
          Hide
          Jake Farrell added a comment -

          Nathan, would like to avoid the version checks for mongrel if possible (THRIFT-1629 has a patch which removes mongrel in favor of thin). I'm currently running through THRIFT-1023-refactor-transport-protocol-for-ruby19-v3.patch right now and testing

          Show
          Jake Farrell added a comment - Nathan, would like to avoid the version checks for mongrel if possible ( THRIFT-1629 has a patch which removes mongrel in favor of thin). I'm currently running through THRIFT-1023 -refactor-transport-protocol-for-ruby19-v3.patch right now and testing
          Hide
          Jake Farrell added a comment -

          this will be in the upcoming 0.9 release once resolved

          Show
          Jake Farrell added a comment - this will be in the upcoming 0.9 release once resolved
          Hide
          Lin Zhao added a comment -

          Guys do we have an eta when the fix will be released? We are hitting it too.

          Show
          Lin Zhao added a comment - Guys do we have an eta when the fix will be released? We are hitting it too.
          Hide
          Nathan Beyer added a comment -

          I did some testing on Ubuntu 12.04 and everything seems to work fine with Ruby 1.8.7-p370 (via RVM), 1.9.3-p194 (via RVM), 1.9.3-p0 (via Ubuntu APT).

          To get the full build to work via Ruby 1.9.3 on Ubuntu requires additional tweaks. Ubuntu packages 'ruby1.9.1' and 'ruby1.9.1-dev' are required. Also, the changes in the patch file 'THRIFT-1023-build-ruby19.patch'.

          The patch 'THRIFT-1023-build-ruby19.patch' is a bit kludgy and needs some consideration. Here's what was changed -

          • Makefile.am - There seems to be a slight behavior change in the interpreter's load mechanism, such that to get the integration tests to run a '-I.' was needed; this includes the local directory.
          • thrift.gemspec - the mongrel dependency must be removed and the easiest way to do that was add the RUBY_VERSION conditional. There's no mechanism in RubyGems to define a dependency as "optional". My suggestion would be to just remove this dependency altogether from the gemspec.
          • lib/rb/spec/mongrel_http_server_spec.rb - this adds a similar RUBY_VERSION check, which essentially removes the contents of the file when Ruby 1.9 is used. If mongrel is removed from the gemspec, then this should be tweaked to capture a LoadError on the load of 'thrift/server/mongrel_http_server'.

          I'm open to suggestions on the best approach to handle the mongrel portions of the code. My suggestion would be to remove it from the gemspec, make the specs only run when mongrel can be loaded. I think mongrel could be added to the Gemfile, such that it's optional, but I'll have to look into that.

          Show
          Nathan Beyer added a comment - I did some testing on Ubuntu 12.04 and everything seems to work fine with Ruby 1.8.7-p370 (via RVM), 1.9.3-p194 (via RVM), 1.9.3-p0 (via Ubuntu APT). To get the full build to work via Ruby 1.9.3 on Ubuntu requires additional tweaks. Ubuntu packages 'ruby1.9.1' and 'ruby1.9.1-dev' are required. Also, the changes in the patch file ' THRIFT-1023 -build-ruby19.patch'. The patch ' THRIFT-1023 -build-ruby19.patch' is a bit kludgy and needs some consideration. Here's what was changed - Makefile.am - There seems to be a slight behavior change in the interpreter's load mechanism, such that to get the integration tests to run a '-I.' was needed; this includes the local directory. thrift.gemspec - the mongrel dependency must be removed and the easiest way to do that was add the RUBY_VERSION conditional. There's no mechanism in RubyGems to define a dependency as "optional". My suggestion would be to just remove this dependency altogether from the gemspec. lib/rb/spec/mongrel_http_server_spec.rb - this adds a similar RUBY_VERSION check, which essentially removes the contents of the file when Ruby 1.9 is used. If mongrel is removed from the gemspec, then this should be tweaked to capture a LoadError on the load of 'thrift/server/mongrel_http_server'. I'm open to suggestions on the best approach to handle the mongrel portions of the code. My suggestion would be to remove it from the gemspec, make the specs only run when mongrel can be loaded. I think mongrel could be added to the Gemfile, such that it's optional, but I'll have to look into that.
          Hide
          Nathan Beyer added a comment -

          THRIFT-1023-refactor-transport-protocol-for-ruby19-v3.patch is another attempt at my patch. The difference between this and the previous patch (v2), is that instead of using what I thought were the Ruby C analogs of the String encoding methods, the code just invokes the same Ruby methods that the pure Ruby code uses.

          This is working for me on OS X Lion for Ruby 1.8.7 and 1.9.3. I'll try to test this on Ubuntu soon.

          Show
          Nathan Beyer added a comment - THRIFT-1023 -refactor-transport-protocol-for-ruby19-v3.patch is another attempt at my patch. The difference between this and the previous patch (v2), is that instead of using what I thought were the Ruby C analogs of the String encoding methods, the code just invokes the same Ruby methods that the pure Ruby code uses. This is working for me on OS X Lion for Ruby 1.8.7 and 1.9.3. I'll try to test this on Ubuntu soon.
          Hide
          Nathan Beyer added a comment -

          Now that I have a running Ubuntu 12.04 setup, I did some further testing. With the latest patch, everything runs fine on Ruby 1.8.7. However, on Ruby 1.9.3, I'm seeing some segfaults. When I remove the native code, everything seems to run fine. I suspect the issue is with how the String encoding is manipulated. Instead of using some of the more direct C methods, I'm going to attempt to rewrite it by just using the Ruby methods invoked from C.

          For those interested, I removed the native code by commenting out the 'build_ext' task in the Rakefile and then running 'bundle exec rake clean' and 'bundle exec rake gem' to test it. You should see an 'unable to load thrift_native' message in the console.

          Show
          Nathan Beyer added a comment - Now that I have a running Ubuntu 12.04 setup, I did some further testing. With the latest patch, everything runs fine on Ruby 1.8.7. However, on Ruby 1.9.3, I'm seeing some segfaults. When I remove the native code, everything seems to run fine. I suspect the issue is with how the String encoding is manipulated. Instead of using some of the more direct C methods, I'm going to attempt to rewrite it by just using the Ruby methods invoked from C. For those interested, I removed the native code by commenting out the 'build_ext' task in the Rakefile and then running 'bundle exec rake clean' and 'bundle exec rake gem' to test it. You should see an 'unable to load thrift_native' message in the console.
          Hide
          Nathan Beyer added a comment - - edited

          Arya, I re-configured like this

          ./configure --without-cpp --without-python --with-ruby

          , which on may setup meant that only the thrift compiler and the ruby library were configured. I then ran 'make' and 'make check', which both passed fine.

          When I configured without any options, the C++ library, Python library and Ruby library were configured. For completeness I tried the following two configs:

          ./configure --without-cpp --with-python --without-ruby
          ./configure --with-cpp --without-python --without-ruby

          The 'make' and 'make check' passed fine on the Python-only config, but 'make check' paused with the same output (Timeout alarm ...) for the C++-only config. This leads me to think there's something missing from the Ubuntu 12.04 setup for the C++ library.

          Show
          Nathan Beyer added a comment - - edited Arya, I re-configured like this ./configure --without-cpp --without-python --with-ruby , which on may setup meant that only the thrift compiler and the ruby library were configured. I then ran 'make' and 'make check', which both passed fine. When I configured without any options, the C++ library, Python library and Ruby library were configured. For completeness I tried the following two configs: ./configure --without-cpp --with-python --without-ruby ./configure --with-cpp --without-python --without-ruby The 'make' and 'make check' passed fine on the Python-only config, but 'make check' paused with the same output (Timeout alarm ...) for the C++-only config. This leads me to think there's something missing from the Ubuntu 12.04 setup for the C++ library.
          Hide
          Nathan Beyer added a comment -

          Arya, What results do you get when you build from trunk without the patch?

          I built from trunk on Ubuntu 12.04 without the patch and get this same result.

          Here's the setup I used -

          Show
          Nathan Beyer added a comment - Arya, What results do you get when you build from trunk without the patch? I built from trunk on Ubuntu 12.04 without the patch and get this same result. Here's the setup I used - Ubuntu 12.04 Base packages from here: http://thrift.apache.org/docs/install/ubuntu/ RVM stable Ruby 1.8.7 install via RVM bundle install from 'lib/rb' folder
          Hide
          Arya Goudarzi added a comment -

          I build Thrift with the patch on a new 12.04 Ubuntu machine with all the packages listed here:

          http://thrift.apache.org/docs/install/ubuntu/

          plus the packages for ruby. I also bundle install inside lib/rb so that correct version of rspec gets install for building. Build went ok, but when I am running "make check" test after AllProtocolTest starts to have problems:

          TBinaryProtocol => OK
          TCompactProtocol => OK
          PASS: AllProtocolsTest
          Timeout alarm expired; attempting to unblock transport
          Timeout alarm expired; attempting to unblock transport
          Timeout alarm expired; attempting to unblock transport
          Timeout alarm expired; attempting to unblock transport
          Timeout alarm expired; attempting to unblock transport

          And it keeps going. Is this concerning or I am doing something wrong? Please advice.

          Show
          Arya Goudarzi added a comment - I build Thrift with the patch on a new 12.04 Ubuntu machine with all the packages listed here: http://thrift.apache.org/docs/install/ubuntu/ plus the packages for ruby. I also bundle install inside lib/rb so that correct version of rspec gets install for building. Build went ok, but when I am running "make check" test after AllProtocolTest starts to have problems: TBinaryProtocol => OK TCompactProtocol => OK PASS: AllProtocolsTest Timeout alarm expired; attempting to unblock transport Timeout alarm expired; attempting to unblock transport Timeout alarm expired; attempting to unblock transport Timeout alarm expired; attempting to unblock transport Timeout alarm expired; attempting to unblock transport And it keeps going. Is this concerning or I am doing something wrong? Please advice.
          Hide
          Nathan Beyer added a comment -

          My patches are against trunk.

          Show
          Nathan Beyer added a comment - My patches are against trunk.
          Hide
          Arya Goudarzi added a comment -

          Hi Nathan,

          We are also heavily affected by this bug and UTF-8 writes to Cassandra, and am looking to apply your patch and test it. I also tried the patch in Thrift-1224 but that has causes some 'end of file reached errors' for us.

          Would you please tell me which version of Thrift you've applied you patch on? Trunk or 0.8?

          Cheers,
          -Arya

          Show
          Arya Goudarzi added a comment - Hi Nathan, We are also heavily affected by this bug and UTF-8 writes to Cassandra, and am looking to apply your patch and test it. I also tried the patch in Thrift-1224 but that has causes some 'end of file reached errors' for us. Would you please tell me which version of Thrift you've applied you patch on? Trunk or 0.8? Cheers, -Arya
          Hide
          Nathan Beyer added a comment - - edited
          THRIFT-1023-refactor-transport-protocol-for-ruby19-v2.patch

          This is a second revision of my previous patch. This includes some tweaks in the Ruby code from my earlier patch as well as an attempt at addressing the Ruby C extension code.

          With this patch all specs are passing on Ruby 1.8.7-p370 and Ruby 1.9.3-p194.

          This is my first attempt at writing Ruby C extension code and it's been a long time since I've written C, so I encourage everyone to review it and test it.

          Show
          Nathan Beyer added a comment - - edited THRIFT-1023-refactor-transport-protocol- for -ruby19-v2.patch This is a second revision of my previous patch. This includes some tweaks in the Ruby code from my earlier patch as well as an attempt at addressing the Ruby C extension code. With this patch all specs are passing on Ruby 1.8.7-p370 and Ruby 1.9.3-p194. This is my first attempt at writing Ruby C extension code and it's been a long time since I've written C, so I encourage everyone to review it and test it.
          Hide
          Nathan Beyer added a comment -

          I tried removing all of the native code and with only one tweak to the patch to avoid a frozen String, the patch seems to pass all relevant tests on Ruby 1.8.7 and Ruby 1.9.3.

          There's an unrelated failure in the spec of the ThreadPoolServer not getting enough invocations of 'serve'.

          Show
          Nathan Beyer added a comment - I tried removing all of the native code and with only one tweak to the patch to avoid a frozen String, the patch seems to pass all relevant tests on Ruby 1.8.7 and Ruby 1.9.3. There's an unrelated failure in the spec of the ThreadPoolServer not getting enough invocations of 'serve'.
          Hide
          Nathan Beyer added a comment -

          Nathan Beyer added a comment - 10/Jul/12 16:18
          ...
          Some of the specs are segfaulting on Ruby 1.9.3 on OS X Lion via RVM using 'bundle exec rake gem'. I haven't identified the core issue of this; i suspect it could be RSpec-1, but I don't know yet.

          While trying to debug the segfaults, I figured out there is much more Ruby C extension code than I expected and my patch doesn't cover any of the code in there. What I don't quite understand is that there is duplication between the C and Ruby code. For example, lib/thrift/memory_buffer_transport.rb defines a Thrift::MemoryBufferTransport#write method and ext/memory_buffer.c defines a Thrift::MemoryBufferTransport#write method. Is there a reason for this? Can the duplicated code be removed?

          Show
          Nathan Beyer added a comment - Nathan Beyer added a comment - 10/Jul/12 16:18 ... Some of the specs are segfaulting on Ruby 1.9.3 on OS X Lion via RVM using 'bundle exec rake gem'. I haven't identified the core issue of this; i suspect it could be RSpec-1, but I don't know yet. While trying to debug the segfaults, I figured out there is much more Ruby C extension code than I expected and my patch doesn't cover any of the code in there. What I don't quite understand is that there is duplication between the C and Ruby code. For example, lib/thrift/memory_buffer_transport.rb defines a Thrift::MemoryBufferTransport#write method and ext/memory_buffer.c defines a Thrift::MemoryBufferTransport#write method. Is there a reason for this? Can the duplicated code be removed?
          Hide
          Nathan Beyer added a comment -

          Jake Farrell added a comment - 10/Jul/12 16:49
          I am fine with switching away from mongrel if it is abandoned, I would just like to avoid adding a ton of dependencies in doing so.

          My suggestion would be to just use WEBrick, which is part of the stdlib in both Ruby 1.8.x and Ruby 1.9.x. It's not as cool and I'm sure it's not as high-performance, but it's stable and eliminates a dependency. Another variation might be to break up the gem and put some of these servers in separate gems with their own dependencies, such that user can pick and choose what they need.

          Show
          Nathan Beyer added a comment - Jake Farrell added a comment - 10/Jul/12 16:49 I am fine with switching away from mongrel if it is abandoned, I would just like to avoid adding a ton of dependencies in doing so. My suggestion would be to just use WEBrick, which is part of the stdlib in both Ruby 1.8.x and Ruby 1.9.x. It's not as cool and I'm sure it's not as high-performance, but it's stable and eliminates a dependency. Another variation might be to break up the gem and put some of these servers in separate gems with their own dependencies, such that user can pick and choose what they need.
          Hide
          Jake Farrell added a comment -

          I am fine with switching away from mongrel if it is abandoned, I would just like to avoid adding a ton of dependencies in doing so.

          Show
          Jake Farrell added a comment - I am fine with switching away from mongrel if it is abandoned, I would just like to avoid adding a ton of dependencies in doing so.
          Hide
          Nathan Beyer added a comment -

          Jake Farrell added a comment - 10/Jul/12 14:49
          We should be able to support both Ruby 1.8.7 and 1.9. Mongrel will work with the --pre option with ruby 1.9

          I wouldn't want to recommend that actually be used in practice though, as mongrel 1.2.0.pre2 was abandoned.

          Show
          Nathan Beyer added a comment - Jake Farrell added a comment - 10/Jul/12 14:49 We should be able to support both Ruby 1.8.7 and 1.9. Mongrel will work with the --pre option with ruby 1.9 I wouldn't want to recommend that actually be used in practice though, as mongrel 1.2.0.pre2 was abandoned.
          Hide
          Nathan Beyer added a comment -

          Attached is a mostly complete patch that refactors the Transport and Protocol classes to utilize binary and UTF-8 encodings appropriately.

          State of the patch -

          • There's no additional testing, which there probably should be.
          • All of the specs pass on Ruby 1.8.7 on OS X Lion via RVM using 'bundle exec rake gem'.
          • Some of the specs are segfaulting on Ruby 1.9.3 on OS X Lion via RVM using 'bundle exec rake gem'. I haven't identified the core issue of this; i suspect it could be RSpec-1, but I don't know yet.
          • There are no changes to the json_protocol.rb and there probably needs to be, but since JSON must be UTF-8 encoded as a whole, a larger analysis needs to be done on that.

          I'd appreciate any comments on the code and any ad hoc testing if others would like to help. Thanks.

          Show
          Nathan Beyer added a comment - Attached is a mostly complete patch that refactors the Transport and Protocol classes to utilize binary and UTF-8 encodings appropriately. State of the patch - There's no additional testing, which there probably should be. All of the specs pass on Ruby 1.8.7 on OS X Lion via RVM using 'bundle exec rake gem'. Some of the specs are segfaulting on Ruby 1.9.3 on OS X Lion via RVM using 'bundle exec rake gem'. I haven't identified the core issue of this; i suspect it could be RSpec-1, but I don't know yet. There are no changes to the json_protocol.rb and there probably needs to be, but since JSON must be UTF-8 encoded as a whole, a larger analysis needs to be done on that. I'd appreciate any comments on the code and any ad hoc testing if others would like to help. Thanks.
          Hide
          Jake Farrell added a comment -

          We should be able to support both Ruby 1.8.7 and 1.9. Mongrel will work with the --pre option with ruby 1.9

          Show
          Jake Farrell added a comment - We should be able to support both Ruby 1.8.7 and 1.9. Mongrel will work with the --pre option with ruby 1.9
          Hide
          Nathan Beyer added a comment -

          What's the minimum version of Ruby that must be supported? 1.8.6 or 1.8.7?

          While trying to work through some of this, I noticed that the mongrel dependency will cause an issue. The bits of the library that utilize mongrel won't work on 1.9, as mongrel went through a refactor to mongrel2 for 1.9 support. Any thought to doing one last Ruby 1.8.x release and targeting the next release to be Ruby 1.9.x minimum?

          Show
          Nathan Beyer added a comment - What's the minimum version of Ruby that must be supported? 1.8.6 or 1.8.7? While trying to work through some of this, I noticed that the mongrel dependency will cause an issue. The bits of the library that utilize mongrel won't work on 1.9, as mongrel went through a refactor to mongrel2 for 1.9 support. Any thought to doing one last Ruby 1.8.x release and targeting the next release to be Ruby 1.9.x minimum?
          Hide
          Jake Farrell added a comment -

          Nathan, when would you have a patch ready for this?

          Show
          Jake Farrell added a comment - Nathan, when would you have a patch ready for this?
          Hide
          Nathan Beyer added a comment -

          I've been doing some research and experimentation and here's what I think needs to be done for this issue and once I get a moment will try to create a patch for it.

          For Ruby 1.9+ support -

          • Make sure all Transport related classes are working with byte buffers (i.e. String with BINARY/ASCII8BIT encoding). This means that any Strings passed to the Transport classes should be assumed to be byte buffers, which would mean all Strings are force encoded to BINARY (via String#force_encoding methods), if they aren't already when passed.
          • Make sure all Protocol related classes are working Strings in UTF-8 encoding. If a String of a different encoding is passed, then transcoding should be perfomed (via String#encode). When passing string-data to the Transport classes, the Strings which are in UTF-8 encoded should be converted to byte buffers (force encoding to BINARY) before being passed on.

          Currently, the Transport classes used Strings for the byte buffers, which is fine, but I think it might be cleaner and easier to understand if a buffer class were introduced that encapsulated encoding manipulation code. There's already some code that's in a utility class that seems to already match this description, perhaps it just needs refactoring a bit.

          Please post if this doesn't make sense or there are other thoughts.

          Show
          Nathan Beyer added a comment - I've been doing some research and experimentation and here's what I think needs to be done for this issue and once I get a moment will try to create a patch for it. For Ruby 1.9+ support - Make sure all Transport related classes are working with byte buffers (i.e. String with BINARY/ASCII8BIT encoding). This means that any Strings passed to the Transport classes should be assumed to be byte buffers, which would mean all Strings are force encoded to BINARY (via String#force_encoding methods), if they aren't already when passed. Make sure all Protocol related classes are working Strings in UTF-8 encoding. If a String of a different encoding is passed, then transcoding should be perfomed (via String#encode). When passing string-data to the Transport classes, the Strings which are in UTF-8 encoded should be converted to byte buffers (force encoding to BINARY) before being passed on. Currently, the Transport classes used Strings for the byte buffers, which is fine, but I think it might be cleaner and easier to understand if a buffer class were introduced that encapsulated encoding manipulation code. There's already some code that's in a utility class that seems to already match this description, perhaps it just needs refactoring a bit. Please post if this doesn't make sense or there are other thoughts.
          Hide
          Nathan Beyer added a comment -

          Is there any additional design doc or code doc for the Ruby classes? I'm trying to discern what APIs are working with bytes and what APIs are working with characters.

          For example, Thrift::Transport::BaseTransport - are the read methods returning bytes or characters, are the writes accepting bytes are characters? I'm assuming it's bytes since the variables are named 'buf', as opposed to the 'str' names used in the protocol APIs, but want to double-check.

          Show
          Nathan Beyer added a comment - Is there any additional design doc or code doc for the Ruby classes? I'm trying to discern what APIs are working with bytes and what APIs are working with characters. For example, Thrift::Transport::BaseTransport - are the read methods returning bytes or characters, are the writes accepting bytes are characters? I'm assuming it's bytes since the variables are named 'buf', as opposed to the 'str' names used in the protocol APIs, but want to double-check.
          Hide
          Nathan Beyer added a comment -

          Any opinions on the level of Ruby 1.8 support that's needed? If $KCODE is 'U', this means all Strings are UTF-8 and the character data could just be treated as-is. This will be common, as frameworks like Rails force the interpreter into this mode at bootstrap. If $KCODE is 'N' (none/ASCII), 'E' (EUC) or 'S' (Shift JIS), then some form of transcoding would have to take place (via iconv?).

          It would be fairly easy to support the Unicode/UTF-8 mode for Ruby 1.8. The other values would not be as easy to support. Do they need to be supported?

          Note - I'm guessing that none of the existing Thrift Ruby code actually works in anything other than $KCODE of 'U'; at least not for code points that are outside of the ASCII range (0-127).

          Show
          Nathan Beyer added a comment - Any opinions on the level of Ruby 1.8 support that's needed? If $KCODE is 'U', this means all Strings are UTF-8 and the character data could just be treated as-is. This will be common, as frameworks like Rails force the interpreter into this mode at bootstrap. If $KCODE is 'N' (none/ASCII), 'E' (EUC) or 'S' (Shift JIS), then some form of transcoding would have to take place (via iconv?). It would be fairly easy to support the Unicode/UTF-8 mode for Ruby 1.8. The other values would not be as easy to support. Do they need to be supported? Note - I'm guessing that none of the existing Thrift Ruby code actually works in anything other than $KCODE of 'U'; at least not for code points that are outside of the ASCII range (0-127).
          Hide
          Jake Farrell added a comment -

          Any updates on this ticket? Getting ready to create 0.9 release candidate

          Show
          Jake Farrell added a comment - Any updates on this ticket? Getting ready to create 0.9 release candidate
          Hide
          Bjoern Rennhak added a comment -

          Ruby 1.8.x doesn't have a #force_encoding method so some solution over $KCODE = "UTF-8" is probably necessary for that.

          Show
          Bjoern Rennhak added a comment - Ruby 1.8.x doesn't have a #force_encoding method so some solution over $KCODE = "UTF-8" is probably necessary for that.
          Hide
          Jack Foy added a comment -

          Hello,

          We're a heavy Ruby shop running into the same problem under Ruby 1.9. We don't believe this is a correct patch; we'll provide more details and a fix shortly. Thanks.

          Show
          Jack Foy added a comment - Hello, We're a heavy Ruby shop running into the same problem under Ruby 1.9. We don't believe this is a correct patch; we'll provide more details and a fix shortly. Thanks.
          Hide
          Roger Meier added a comment -

          I have the following issue with your patch on debian using ruby 1.8 :

          Thrift::Client should increment the sequence id when sending messages (it seems   sequence ids are completely ignored right now)
          ./spec/client_spec.rb:56
          
          1)
          NoMethodError in 'Thrift::BufferedTransport should buffer writes and send them o  n flush'
          undefined method `force_encoding' for "one/":String
          ./spec/base_transport_spec.rb:95:
          
          2)
          NoMethodError in 'Thrift::BufferedTransport should only send buffered data once'
          undefined method `force_encoding' for "one/":String
          ./spec/base_transport_spec.rb:106:
          
          3)
          NoMethodError in 'Thrift::HTTPClientTransport should post via HTTP and return th  e results'
          undefined method `force_encoding' for "a test":String
          ./spec/http_client_spec.rb:37:
          
          4)
          NoMethodError in 'Thrift::HTTPClientTransport should send custom headers if defi  ned'
          undefined method `force_encoding' for "test":String
          ./spec/http_client_spec.rb:50:
          
          Finished in 3.558732 seconds
          
          349 examples, 4 failures, 1 pending
          rake aborted!
          Command /usr/bin/ruby1.8 -I"lib"  "/usr/bin/spec" "spec/base_protocol_spec.rb" "  spec/base_transport_spec.rb" "spec/binary_protocol_accelerated_spec.rb" "spec/bi  nary_protocol_spec.rb" "spec/client_spec.rb" "spec/exception_spec.rb" "spec/mong  rel_http_server_spec.rb" "spec/processor_spec.rb" "spec/server_socket_spec.rb" "  spec/struct_spec.rb" "spec/union_spec.rb" "spec/socket_spec.rb" "spec/json_proto  col_spec.rb" "spec/struct_nested_containers_spec.rb" "spec/compact_protocol_spec  .rb" "spec/http_client_spec.rb" "spec/nonblocking_server_spec.rb" "spec/serializ  er_spec.rb" "spec/server_spec.rb" "spec/types_spec.rb" "spec/unix_socket_spec.rb  " --color failed
          
          
          Show
          Roger Meier added a comment - I have the following issue with your patch on debian using ruby 1.8 : Thrift::Client should increment the sequence id when sending messages (it seems sequence ids are completely ignored right now) ./spec/client_spec.rb:56 1) NoMethodError in 'Thrift::BufferedTransport should buffer writes and send them o n flush' undefined method `force_encoding' for "one/":String ./spec/base_transport_spec.rb:95: 2) NoMethodError in 'Thrift::BufferedTransport should only send buffered data once' undefined method `force_encoding' for "one/":String ./spec/base_transport_spec.rb:106: 3) NoMethodError in 'Thrift::HTTPClientTransport should post via HTTP and return th e results' undefined method `force_encoding' for "a test":String ./spec/http_client_spec.rb:37: 4) NoMethodError in 'Thrift::HTTPClientTransport should send custom headers if defi ned' undefined method `force_encoding' for "test":String ./spec/http_client_spec.rb:50: Finished in 3.558732 seconds 349 examples, 4 failures, 1 pending rake aborted! Command /usr/bin/ruby1.8 -I"lib" "/usr/bin/spec" "spec/base_protocol_spec.rb" " spec/base_transport_spec.rb" "spec/binary_protocol_accelerated_spec.rb" "spec/bi nary_protocol_spec.rb" "spec/client_spec.rb" "spec/exception_spec.rb" "spec/mong rel_http_server_spec.rb" "spec/processor_spec.rb" "spec/server_socket_spec.rb" " spec/struct_spec.rb" "spec/union_spec.rb" "spec/socket_spec.rb" "spec/json_proto col_spec.rb" "spec/struct_nested_containers_spec.rb" "spec/compact_protocol_spec .rb" "spec/http_client_spec.rb" "spec/nonblocking_server_spec.rb" "spec/serializ er_spec.rb" "spec/server_spec.rb" "spec/types_spec.rb" "spec/unix_socket_spec.rb " --color failed
          Hide
          Vasiliy Sablin added a comment -

          Hello, here is my fix, as on github.

          Show
          Vasiliy Sablin added a comment - Hello, here is my fix, as on github.
          Hide
          Jake Farrell added a comment -

          We do not accept pull requests via github currently. Please see http://wiki.apache.org/thrift/HowToContribute

          Show
          Jake Farrell added a comment - We do not accept pull requests via github currently. Please see http://wiki.apache.org/thrift/HowToContribute
          Hide
          Bo Stendal Sørensen added a comment - - edited

          FYI.
          Someone did a pull request on github with a fix for this issue: https://github.com/apache/thrift/pull/9

          If the apache process requires all patches to go through jira I could easily convert the commit in the pull request into a patch and attach them here.

          Show
          Bo Stendal Sørensen added a comment - - edited FYI. Someone did a pull request on github with a fix for this issue: https://github.com/apache/thrift/pull/9 If the apache process requires all patches to go through jira I could easily convert the commit in the pull request into a patch and attach them here.
          Hide
          Andersen Fan added a comment -

          have the same problem at ubuntu/thrift0.8/ruby1.9.3

          Show
          Andersen Fan added a comment - have the same problem at ubuntu/thrift0.8/ruby1.9.3

            People

            • Assignee:
              Nathan Beyer
              Reporter:
              Vincent Peres
            • Votes:
              6 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development