Cassandra
  1. Cassandra
  2. CASSANDRA-4003

cqlsh still failing to handle decode errors in some column names

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 1.0.9, 1.1.0
    • Component/s: Tools
    • Labels:

      Description

      Columns which are expected to be text, but which are not valid utf8, cause cqlsh to display an error and not show any output:

      cqlsh:ks> CREATE COLUMNFAMILY test (a text PRIMARY KEY) WITH comparator = timestamp;
      cqlsh:ks> INSERT INTO test (a, '2012-03-05') VALUES ('val1', 'val2');
      cqlsh:ks> ASSUME test NAMES ARE text;
      cqlsh:ks> select * from test;
      'utf8' codec can't decode byte 0xe1 in position 4: invalid continuation byte
      

      the traceback with cqlsh --debug:

      Traceback (most recent call last):
        File "bin/cqlsh", line 581, in onecmd
          self.handle_statement(st)
        File "bin/cqlsh", line 606, in handle_statement
          return custom_handler(parsed)
        File "bin/cqlsh", line 663, in do_select
          self.perform_statement_as_tokens(parsed.matched, decoder=decoder)
        File "bin/cqlsh", line 666, in perform_statement_as_tokens
          return self.perform_statement(cqlhandling.cql_detokenize(tokens), decoder=decoder)
        File "bin/cqlsh", line 693, in perform_statement
          self.print_result(self.cursor)
        File "bin/cqlsh", line 728, in print_result
          self.print_static_result(cursor)
        File "bin/cqlsh", line 742, in print_static_result
          formatted_names = map(self.myformat_colname, colnames)
        File "bin/cqlsh", line 413, in myformat_colname
          wcwidth.wcswidth(name.decode(self.output_codec.name)))
        File "/usr/local/Cellar/python/2.7.2/lib/python2.7/encodings/utf_8.py", line 16, in decode
          return codecs.utf_8_decode(input, errors, True)
      UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 4: invalid continuation byte
      
      1. 4003-2.txt
        1 kB
        paul cannon

        Issue Links

          Activity

          Hide
          paul cannon added a comment -

          A fix is pushed to my 4003 github branch, also tagged at https://github.com/thepaul/cassandra/tree/pending/4003 .

          Show
          paul cannon added a comment - A fix is pushed to my 4003 github branch, also tagged at https://github.com/thepaul/cassandra/tree/pending/4003 .
          Hide
          Jonathan Ellis added a comment -

          It looks like the core of the fix is this:

          +    def get_nametype(self, cursor, num):
          +        """
          +        Determine the Cassandra type of a column name from the current row of
          +        query results on the given cursor. The column in question is given by
          +        its zero-based ordinal number within the row.
          +
          +        Pretty big hack, but necessary to differentiate some things like ascii
          +        vs. blob hex. Probably this should be available from the driver
          +        somehow, instead.
          +        """
          +
          +        row = cursor.result[cursor.rs_idx - 1]
          +        col = row.columns[num]
          +        schema = cursor.decoder.schema
          +        return schema.name_types.get(col.name, schema.default_name_type)
          

          Can you elaborate as to what's going on?

          Show
          Jonathan Ellis added a comment - It looks like the core of the fix is this: + def get_nametype(self, cursor, num): + """ + Determine the Cassandra type of a column name from the current row of + query results on the given cursor. The column in question is given by + its zero-based ordinal number within the row. + + Pretty big hack, but necessary to differentiate some things like ascii + vs. blob hex. Probably this should be available from the driver + somehow, instead. + """ + + row = cursor.result[cursor.rs_idx - 1] + col = row.columns[num] + schema = cursor.decoder.schema + return schema.name_types.get(col.name, schema.default_name_type) Can you elaborate as to what's going on?
          Hide
          paul cannon added a comment -

          Sure. Since the CQL driver deserializes column names before the client software (cqlsh) can see them, and does not expose the Cassandra data type for the column names, it was not always possible to determine from returned column names how they were meant to be interpreted. For example, it was sometimes impossible to tell TimeUUIDType from UUIDType, or any of the various integer or counter types apart, or even BytesType from AsciiType.

          Cqlsh makes an effort to display data in the most meaningful form, and secondarily to visually distinguish data that would otherwise be too ambiguous using colors. So it needs to know the original column name type.

          The CQL driver does not expose that, so this code uses internals to get it. Clearly it would make more sense to expose the info from the driver side, and I plan to do that, but it takes some extra process and testing. This hack is backwards compatible with older CQL driver versions, but possibly not forwards-compat.

          Maybe it would be best to do a runtime check against the driver to see if it supports exposing column types before making this call.

          Show
          paul cannon added a comment - Sure. Since the CQL driver deserializes column names before the client software (cqlsh) can see them, and does not expose the Cassandra data type for the column names, it was not always possible to determine from returned column names how they were meant to be interpreted. For example, it was sometimes impossible to tell TimeUUIDType from UUIDType, or any of the various integer or counter types apart, or even BytesType from AsciiType. Cqlsh makes an effort to display data in the most meaningful form, and secondarily to visually distinguish data that would otherwise be too ambiguous using colors. So it needs to know the original column name type. The CQL driver does not expose that, so this code uses internals to get it. Clearly it would make more sense to expose the info from the driver side, and I plan to do that, but it takes some extra process and testing. This hack is backwards compatible with older CQL driver versions, but possibly not forwards-compat. Maybe it would be best to do a runtime check against the driver to see if it supports exposing column types before making this call.
          Hide
          paul cannon added a comment -

          The attached patch (also present in my updated 4003 branch) will check for column name-type support in the CQL driver before using the direct-inspection approach. python-cql version 1.0.10 supports this, but we don't need to require support for 1.0.10 yet.

          Show
          paul cannon added a comment - The attached patch (also present in my updated 4003 branch) will check for column name-type support in the CQL driver before using the direct-inspection approach. python-cql version 1.0.10 supports this, but we don't need to require support for 1.0.10 yet.
          Hide
          paul cannon added a comment -

          Since i linked the previous tag, the new one is at https://github.com/thepaul/cassandra/tree/pending/4003-2 .

          Show
          paul cannon added a comment - Since i linked the previous tag, the new one is at https://github.com/thepaul/cassandra/tree/pending/4003-2 .
          Hide
          Brandon Williams added a comment -

          Committed.

          Show
          Brandon Williams added a comment - Committed.
          Hide
          Jeremy Hanna added a comment -

          this should be fixed in python-cql version 1.0.10 and cassandra 1.0.9? I'm using dse 2.1 which is based on cassandra 1.0.10 and I did apt-show-versions of python cql and it says "python-cql/stable uptodate 1.0.10-1". I'm still getting this error:

          'ascii' codec can't decode byte 0xc3 in position 9: ordinal not in range(128)

          Is there still the possibility for this to come up? I'm utilizing a solr_query in the where clause btw, if that makes any difference.

          Show
          Jeremy Hanna added a comment - this should be fixed in python-cql version 1.0.10 and cassandra 1.0.9? I'm using dse 2.1 which is based on cassandra 1.0.10 and I did apt-show-versions of python cql and it says "python-cql/stable uptodate 1.0.10-1". I'm still getting this error: 'ascii' codec can't decode byte 0xc3 in position 9: ordinal not in range(128) Is there still the possibility for this to come up? I'm utilizing a solr_query in the where clause btw, if that makes any difference.
          Hide
          paul cannon added a comment -

          That shouldn't make a difference. Can you tell me what data is in that cf/column and paste the full output of a minimal "cqlsh --debug" session with that query?

          Show
          paul cannon added a comment - That shouldn't make a difference. Can you tell me what data is in that cf/column and paste the full output of a minimal "cqlsh --debug" session with that query?

            People

            • Assignee:
              paul cannon
              Reporter:
              paul cannon
              Reviewer:
              Brandon Williams
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development