Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-1783

Gracefully handle strings with wrong character encoding



    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.7.7
    • 1.8.0
    • ruby
    • None


      In the vote thread for Avro 1.8.0-rc2, busbey noticed that phunt's avro-rpc-quickstart fails:

      busbey$ ruby sample_ipc_client.rb avro_user pat Hello_World
      Avro::IO::AvroTypeError: The datum
      is not an example of schema
                    write_data at
                  write_record at
                          each at org/jruby/RubyArray.java:1613
                  write_record at
                    write_data at
                         write at
       write_handshake_request at
                       request at
                       request at
                        (root) at sample_ipc_client.rb:49

      I tried reproducing the error, and it is quite strange. avro-rpc-quickstart works fine for me in Ruby (MRI) 2.2 and 2.1, and in JRuby 1.7.23. However, busbey was using JRuby 1.7.3 (as visible from the path names above), and in this particular version of JRuby I was able to reproduce the issue.

      It seems that in some circumstances (but not always, bizarrely), JRuby 1.7.3 returns a UTF-8 encoded string from Digest::MD5.digest, rather than a binary-encoded string. Schema.validate checks that the string is suitable for writing as datum for a fixed type by calling #size. In this case, although the MD5 digest of the schema is a 16-byte string, if you interpret it as a UTF-8 encoded string, it consists of only 13 characters (i.e. some sequences are interpreted as multibyte characters).

      Rather than trying to divine why JRuby is being weird here, I think this is an opportunity to fix Avro's handling of strings to make it robust against unexpected encodings.


        1. AVRO-1783.patch
          3 kB
          Martin Kleppmann
        2. AVRO-1783.stack.text
          45 kB
          Ryan Blue
        3. AVRO-1783-2.patch
          6 kB
          Martin Kleppmann



            martinkl Martin Kleppmann
            martinkl Martin Kleppmann
            0 Vote for this issue
            5 Start watching this issue