Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
0.20.2
-
None
-
None
-
None
Description
Trying to access binary data stored in HDFS (in my case, TypedByte files generated by Dumbo) via thrift talking to org.apache.hadoop.thriftfs.HadoopThriftServer, the data I get back is mangled. For example, when I read a file which contains the value 0xa2, it's coming back as 0xef 0xbf 0xbd, also known as the Unicode replacement character.
I think this is because the read method in HadoopThriftServer.java is trying to convert the data read from HDFS into UTF-8 via the String() constructor.
This essentially makes the HDFS thrift API useless for me .
Not being an expert on Thrift, but would it be possible to modify the API so that it uses the binary type listed on http://wiki.apache.org/thrift/ThriftTypes?