Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-8101

Invalid ASCII and UTF-8 chars not rejected in CQL string literals

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Urgent
    • Resolution: Fixed
    • Fix Version/s: 2.0.11, 2.1.1
    • Component/s: None
    • Labels:
      None
    • Severity:
      Critical

      Description

      When processing CQL string literals, we ultimately use String.getBytes(Charset), which has the following note:

      This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement byte array. The CharsetEncoder class should be used when more control over the encoding process is required.

      So, if we insert a non-ASCII character into an ascii string literal, it will be replaced with a ? char. Something similar happens for UTF-8.

      For example:

      cqlsh:ks1> create table badstrings (a int primary key, b ascii);
      cqlsh:ks1> insert into badstrings (a, b) VALUES ( 0, 'ΎΔδϠ');
      cqlsh:ks1> select * from badstrings;
      
       a | b
      ---+------
       0 | ????
      

        Attachments

        1. 8101.txt
          5 kB
          Tom Hobbs

          Activity

            People

            • Assignee:
              thobbs Tom Hobbs
              Reporter:
              thobbs Tom Hobbs
              Authors:
              Tom Hobbs
              Reviewers:
              Aleksey Yeschenko
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: