Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.11.0
-
None
-
Ubuntu 16.04.6 LTS
Description
Impala currently uses thrift-0.11.0 on client side and thrift-0.9.3 on server side (server side upgrade is blocked by some issues). We encountered an issue in decoding utf8 bytes on the client side. The result has a partial utf8 code point. But thrift is not handling the error elegantly. The stacktrace:
Traceback (most recent call last): File "/home/quanlong/workspace/Impala/shell/impala_client.py", line 1210, in _do_beeswax_rpc ret = rpc() File "/home/quanlong/workspace/Impala/shell/impala_client.py", line 1113, in <lambda> self.fetch_size)) File "/home/quanlong/workspace/Impala/shell/build/thrift-11-gen/gen-py/beeswaxd/BeeswaxService.py", line 254, in fetch return self.recv_fetch() File "/home/quanlong/workspace/Impala/shell/build/thrift-11-gen/gen-py/beeswaxd/BeeswaxService.py", line 275, in recv_fetch result.read(iprot) File "/home/quanlong/workspace/Impala/shell/build/thrift-11-gen/gen-py/beeswaxd/BeeswaxService.py", line 1410, in read iprot._fast_decode(self, iprot, [self.__class__, self.thrift_spec]) UnicodeDecodeError: 'utf8' codec can't decode byte 0xe6 in position 3: unexpected end of data
This is similar to THRIFT-2087, but the error happens in the boundary between Python and C++ codes. Just like THRIFT-2087, we need to provide an error handling behavior of decoding utf-8 bytes in TBinaryProtocolAccelerated._fast_decode. The related codes are https://github.com/apache/thrift/blob/0.11.0/lib/py/src/ext/protocol.tcc#L708
case T_STRING: { char* buf = NULL; int len = impl()->readString(&buf); if (len < 0) { return NULL; } if (isUtf8(typeargs)) { return PyUnicode_DecodeUTF8(buf, len, 0); <--- Needs to provide an error handling method here } else { return PyBytes_FromStringAndSize(buf, len); } }
Attachments
Issue Links
- causes
-
IMPALA-10299 Impala-shell hangs in printing partial UTF-8 characters
- Resolved
-
IMPALA-10145 UnicodeDecodeError in Thrift 0.11.0 generated files
- Resolved
- relates to
-
THRIFT-2087 unicode decode errors
- Closed
- links to