Uploaded image for project: 'Thrift'
  1. Thrift
  2. THRIFT-1023

Thrift encoding (UTF-8) issue with Ruby 1.9.2

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.5
    • 0.9
    • Ruby - Library
    • None
    • OSX, Ruby 1.9.2, Thrift Gem version 0.5.0

    Description

      I came up with an encoding issue coming from the Thrift library, and especially the BufferedTransport class.
      I've decided to write down few tests to give you a concrete example :

      1. encoding: utf-8
        require 'spec_helper'

      describe "encoding" do

      before do
      transport = Thrift::BufferedTransport.new(Thrift::Socket.new(MR_CONFIG['host'], 9090))
      protocol = Thrift::BinaryProtocol.new(transport)
      @client = Apache::Hadoop::Hbase::Thrift::Hbase::Client.new(protocol)

      transport.open()

      @table_name = "encoding_test"
      @column_family = "info:"
      end

      it "should create a new table" do
      column = Apache::Hadoop::Hbase::Thrift::ColumnDescriptor.new

      {|c| c.name= @column_family}

      @client.createTable(@table_name, [column]).should be_nil
      end

      it "should save standard caracteres" do
      m = Apache::Hadoop::Hbase::Thrift::Mutation.new
      m.column = "info:first_name"
      m.value = "Vincent"

      m.value.encoding.should == Encoding::UTF_8
      @client.mutateRow(@table_name, "ID1", [m]).should be_nil
      end

      it "should save UTF8 caracteres" do
      m = Apache::Hadoop::Hbase::Thrift::Mutation.new
      m.column = "info:first_name"
      m.value = "Thorbjørn"

      m.value.encoding.should == Encoding::UTF_8
      @client.mutateRow(@table_name, "ID1", [m]).should be_nil
      end

      it "should destroy the table" do
      @client.disableTable(@table_name).should be_nil
      @client.deleteTable(@table_name).should be_nil
      end
      end

      It fails when it tries to save the UTF8 string including the caractere 'ø'.

      Here is the output :

      1) encoding should save UTF8 caracteres
      Failure/Error: @client.mutateRow(@table_name, "ID1", [m]).should be_nil
      incompatible character encodings: ASCII-8BIT and UTF-8
      #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/transport/buffered_transport.rb:59:in
      `write'
      #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/protocol/binary_protocol.rb:107:in
      `write_string'
      #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in
      `write'
      #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in
      `send_message'

      1. ./lib/thrift/hbase.rb:289:in `send_mutateRow'
      2. ./lib/thrift/hbase.rb:284:in `mutateRow'
      3. ./spec/thrift/cases/encoding_spec.rb:37:in `block (2 levels) in <top
        (required)>'

      Let me know if you need any other details, thank you !

      Attachments

        Issue Links

          Activity

            People

              nbeyer Nathan Beyer
              vincentp Vincent Peres
              Votes:
              6 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: