Description
In the vote thread for Avro 1.8.0-rc2, busbey noticed that phunt's avro-rpc-quickstart fails:
busbey$ ruby sample_ipc_client.rb avro_user pat Hello_World Avro::IO::AvroTypeError: The datum "\x89\xA9\xD1\xFF@NUm\xEA\x9A\xFB\xDAx\xF5Zq" is not an example of schema {"type":"fixed","name":"MD5","namespace":"org.apache.avro.ipc","size":16} write_data at /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:543 write_record at /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:610 each at org/jruby/RubyArray.java:1613 write_record at /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:609 write_data at /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:561 write at /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:538 write_handshake_request at /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:136 request at /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:105 request at /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:117 (root) at sample_ipc_client.rb:49
I tried reproducing the error, and it is quite strange. avro-rpc-quickstart works fine for me in Ruby (MRI) 2.2 and 2.1, and in JRuby 1.7.23. However, busbey was using JRuby 1.7.3 (as visible from the path names above), and in this particular version of JRuby I was able to reproduce the issue.
It seems that in some circumstances (but not always, bizarrely), JRuby 1.7.3 returns a UTF-8 encoded string from Digest::MD5.digest, rather than a binary-encoded string. Schema.validate checks that the string is suitable for writing as datum for a fixed type by calling #size. In this case, although the MD5 digest of the schema is a 16-byte string, if you interpret it as a UTF-8 encoded string, it consists of only 13 characters (i.e. some sequences are interpreted as multibyte characters).
Rather than trying to divine why JRuby is being weird here, I think this is an opportunity to fix Avro's handling of strings to make it robust against unexpected encodings.