Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-607

Impala Shell throws UnicodeDecode errors when executing a query containing unicode characters

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • Impala 1.1.1
    • None
    • None
    • None

    Description

      The following query throws an exception when printing to the screen

      SELECT COUNT(*) FROM <table> WHERE content LIKE '%ąśćó%';
      

      If you run this query, you will see we throw an exception when printing to stderr within our print_to_stderr method. This is kind of a red herring, since there is a lower level exception that can be thrown (even if this were to go through).

      Error before enabling quiet mode.

      Traceback (most recent call last):
        File "/opt/cloudera/parcels/IMPALA-1.1.1-1.p0.17/bin/../lib/impala-shell/impala_shell.py", line 1023, in ?
          shell.cmdloop(intro)
        File "/usr/lib64/python2.4/cmd.py", line 142, in cmdloop
          stop = self.onecmd(line)
        File "/usr/lib64/python2.4/cmd.py", line 219, in onecmd
          return func(arg)
        File "/opt/cloudera/parcels/IMPALA-1.1.1-1.p0.17/bin/../lib/impala-shell/impala_shell.py", line 691, in do_select
          return self.__execute_query(query)
        File "/opt/cloudera/parcels/IMPALA-1.1.1-1.p0.17/bin/../lib/impala-shell/impala_shell.py", line 480, in __execute_query
          self.__print_if_verbose("Query: %s" % (query.query,))
        File "/opt/cloudera/parcels/IMPALA-1.1.1-1.p0.17/bin/../lib/impala-shell/impala_shell.py", line 587, in __print_if_verbose
          print_to_stderr(message)
        File "/opt/cloudera/parcels/IMPALA-1.1.1-1.p0.17/bin/../lib/impala-shell/impala_shell.py", line 896, in print_to_stderr
          print >>sys.stderr, message
      UnicodeEncodeError: 'ascii' codec can't encode characters in position 61-64: ordinal not in range(128)
      

      If you run the shell with --quiet, you essentially skip this method, sending you to the next area where the exception is thrown. However, since we actually handle this exception (line #830: impala_shell.py 1.1.1 code). I removed lines #830 -> #833 to leave the exception unhandled, showing where the error is coming from (TTransport).

      Error after enabling quiet mode, getting further than before.

      Exception in thread Thread-2:
      Traceback (most recent call last):
        File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap
          self.run()
        File "/usr/lib64/python2.4/threading.py", line 422, in run
          self.__target(*self.__args, **self.__kwargs)
        File "/opt/cloudera/parcels/IMPALA-1.1.1-1.p0.17/bin/../lib/impala-shell/impala_shell.py", line 808, in __do_rpc_thread
          ret = rpc()
        File "/opt/cloudera/parcels/IMPALA-1.1.1-1.p0.17/bin/../lib/impala-shell/impala_shell.py", line 482, in <lambda>
          (handle, status) = self.__do_rpc(lambda: self.imp_service.query(query))
        File "/opt/cloudera/parcels/IMPALA-1.1.1-1.p0.17/lib/impala-shell/gen-py/beeswaxd/BeeswaxService.py", line 141, in query
          self.send_query(query)
        File "/opt/cloudera/parcels/IMPALA-1.1.1-1.p0.17/lib/impala-shell/gen-py/beeswaxd/BeeswaxService.py", line 148, in send_query
          args.write(self._oprot)
        File "/opt/cloudera/parcels/IMPALA-1.1.1-1.p0.17/lib/impala-shell/gen-py/beeswaxd/BeeswaxService.py", line 770, in write
          self.query.write(oprot)
        File "/opt/cloudera/parcels/IMPALA-1.1.1-1.p0.17/lib/impala-shell/gen-py/beeswaxd/ttypes.py", line 110, in write
          oprot.writeString(self.query)
        File "/opt/cloudera/parcels/IMPALA-1.1.1-1.p0.17/lib/impala-shell/lib/thrift/protocol/TBinaryProtocol.py", line 123, in writeString
          self.trans.write(str)
        File "/opt/cloudera/parcels/IMPALA-1.1.1-1.p0.17/lib/impala-shell/lib/thrift/transport/TTransport.py", line 163, in write
          self.__wbuf.write(buf)
      UnicodeEncodeError: 'ascii' codec can't encode characters in position 54-57: ordinal not in range(128)
      

      Attachments

        Activity

          People

            ishaan Ishaan Joshi
            rickysaltzer Ricky Saltzer
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: