Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-8101

Invalid ASCII and UTF-8 chars not rejected in CQL string literals

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Urgent
    • Resolution: Fixed
    • 2.0.11, 2.1.1
    • None
    • None
    • Critical

    Description

      When processing CQL string literals, we ultimately use String.getBytes(Charset), which has the following note:

      This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement byte array. The CharsetEncoder class should be used when more control over the encoding process is required.

      So, if we insert a non-ASCII character into an ascii string literal, it will be replaced with a ? char. Something similar happens for UTF-8.

      For example:

      cqlsh:ks1> create table badstrings (a int primary key, b ascii);
      cqlsh:ks1> insert into badstrings (a, b) VALUES ( 0, 'ΎΔδϠ');
      cqlsh:ks1> select * from badstrings;
      
       a | b
      ---+------
       0 | ????
      

      Attachments

        1. 8101.txt
          5 kB
          Tom Hobbs

        Activity

          People

            thobbs Tom Hobbs
            thobbs Tom Hobbs
            Tom Hobbs
            Aleksey Yeschenko
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: