Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11325

Impala-shell hits UnicodeDecodeError when outputting Unicode via --output_file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • Impala 4.1.0
    • Impala 4.2.0, Impala 4.1.1
    • Clients
    • None
    • ghx-label-11

    Description

      When running impala-shell and trying to output Unicode to a fail via --output file, it fails:

      ishell -B -q "select '引'" --output_file=joetest3.txt
      /home/joe/view2/Impala/shell/option_parser.py:359: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
        if '--live_progress' in sys.argv and '--disable_live_progress' in sys.argv:
      /home/joe/view2/Impala/shell/option_parser.py:363: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
        if '--strict_hs2_protocol' in sys.argv:
      /home/joe/view2/Impala/shell/option_parser.py:369: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
        if '--verbose' in sys.argv and '--quiet' in sys.argv:
      Starting Impala Shell with no authentication using Python 2.7.16
      Warning: live_progress only applies to interactive shell sessions, and is being skipped for now.
      Opened TCP connection to localhost:21050
      Connected to localhost:21050
      Server version: impalad version 4.1.0-SNAPSHOT DEBUG (build 4236c307b971881a3b1d85068db5b053a9c34cfa)
      Query: select '引'
      Query submitted at: 2022-05-31 08:31:50 (Coordinator: http://joemcdonnell:25000)
      Query progress can be monitored at: http://joemcdonnell:25000/query_plan?query_id=2347462fe8a18544:bbeedc1800000000
      UnicodeDecodeError : 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128) 
      Please check for columns containing binary data to find the possible source of the error.
      Could not execute command: select '引'

      This is specific to file output. This same query works if outputting to the console.

      This line seems to be the problem:

              with open(self.filename, 'ab') as out_file:
                # Note that instances of this class do not persist, so it's fine to
                # close the we close the file handle after each write.
                out_file.write(formatted_data.encode('utf-8'))  # file opened in binary mode <--------
                out_file.write(b'\n')
      

      https://github.com/apache/impala/blob/master/shell/shell_output.py#L115

      It seems to work if we remove the .encode('utf-8').

      Attachments

        Activity

          People

            joemcdonnell Joe McDonnell
            joemcdonnell Joe McDonnell
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: