Cassandra
  1. Cassandra
  2. CASSANDRA-4031

Exceptions during inserting emtpy string as column value on indexed column

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.1.0
    • Component/s: None
    • Labels:
      None

      Description

      Hi,
      I`m running one node cluster(issue occurs also on other cluster(which has 2 nodes)) on snapshot from cassandra-1.1 branch(i used 449e037195c3c504d7aca5088e8bc7bd5a50e7d0 commit).
      i have simple CF, definition of TestCF:

      [default@test_keyspace] describe Test_CF;
          ColumnFamily: Test_CF
            Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
            Default column value validator: org.apache.cassandra.db.marshal.UTF8Type
            Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
            GC grace seconds: 864000
            Compaction min/max thresholds: 4/32
            Read repair chance: 1.0
            DC Local Read repair chance: 0.0
            Replicate on write: true
            Caching: KEYS_ONLY
            Bloom Filter FP chance: default
            Built indexes: [Test_CF.Test_CF_test_index_idx]
            Column Metadata:
              Column Name: test_index
                Validation Class: org.apache.cassandra.db.marshal.UTF8Type
                Index Name: Test_CF_test_index_idx
                Index Type: KEYS
            Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
            Compression Options:
              sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
      

      I`m trying to add new row(log from cassandra-cli, note that there is index on test_index):

      [default@test_keyspace] list Test_CF;              
      Using default limit of 100
      
      0 Row Returned.
      Elapsed time: 31 msec(s).
      [default@test_keyspace] set Test_CF[absdsad3][test_index]='';
      null
      TimedOutException()
      	at org.apache.cassandra.thrift.Cassandra$insert_result.read(Cassandra.java:15906)
      	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
      	at org.apache.cassandra.thrift.Cassandra$Client.recv_insert(Cassandra.java:788)
      	at org.apache.cassandra.thrift.Cassandra$Client.insert(Cassandra.java:772)
      	at org.apache.cassandra.cli.CliClient.executeSet(CliClient.java:894)
      	at org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:211)
      	at org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:219)
      	at org.apache.cassandra.cli.CliMain.main(CliMain.java:346)
      [default@test_keyspace] list Test_CF;                        
      Using default limit of 100
      -------------------
      RowKey: absdsad3
      => (column=test_index, value=, timestamp=1331298173009000)
      
      1 Row Returned.
      Elapsed time: 7 msec(s).
      

      Exception from system.log:

       INFO [FlushWriter:56] 2012-03-09 13:42:02,500 Memtable.java (line 291) Completed flushing /var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-3251-Data.db (2077 bytes)
      ERROR [MutationStage:2291] 2012-03-09 13:42:22,232 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[MutationStage:2291,5,main]
      java.lang.AssertionError
              at org.apache.cassandra.db.DecoratedKey.<init>(DecoratedKey.java:55)
              at org.apache.cassandra.db.index.SecondaryIndexManager.getIndexKeyFor(SecondaryIndexManager.java:294)
              at org.apache.cassandra.db.index.SecondaryIndexManager.applyIndexUpdates(SecondaryIndexManager.java:490)
              at org.apache.cassandra.db.Table.apply(Table.java:441)
              at org.apache.cassandra.db.Table.apply(Table.java:366)
              at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:275)
              at org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:446)
              at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1228)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:662)
      
      1. 4031.txt
        3 kB
        Yuki Morishita

        Activity

        Hide
        Jonathan Ellis added a comment -

        committed

        Show
        Jonathan Ellis added a comment - committed
        Hide
        Yuki Morishita added a comment -

        For v1.1, I propose not to check for empty key inside DK, removing assertion in its constructor.
        I also added key validation check to CQL insert/update so that no empty key gets inserted(CLI/thrift already do validation).

        At first I tried to allow empty key DK only in secondary index by creating static method that generates DK with empty key, but it turned out to be redundant because Memtable#put recreates DK when storing.

        Show
        Yuki Morishita added a comment - For v1.1, I propose not to check for empty key inside DK, removing assertion in its constructor. I also added key validation check to CQL insert/update so that no empty key gets inserted(CLI/thrift already do validation). At first I tried to allow empty key DK only in secondary index by creating static method that generates DK with empty key, but it turned out to be redundant because Memtable#put recreates DK when storing.
        Hide
        Yuki Morishita added a comment -

        Same error happens when performing searching for empty value(='') on indexed column.

        I agree with Sylvain, empty row key should not be allowed. But for version 1.1, I think it is fine to use empty row key on secondary indices, otherwise we have to perform full data scan and filter out all that have empty value on indexed column(or refuse query which has "=''").

        I will fix this by adding empty key DK only for secondary index.

        Show
        Yuki Morishita added a comment - Same error happens when performing searching for empty value(='') on indexed column. I agree with Sylvain, empty row key should not be allowed. But for version 1.1, I think it is fine to use empty row key on secondary indices, otherwise we have to perform full data scan and filter out all that have empty value on indexed column(or refuse query which has "=''"). I will fix this by adding empty key DK only for secondary index.
        Hide
        Sylvain Lebresne added a comment -

        It's because indexed value are used as row key (in the index). Previously, empty row key was actually allowed by the code (but imo it was more of a bug we were not aware of).

        The right fix may probably be to refuse empty value, though that could appear as a limitation. The other solution would be to create a special constructor for DK that allows an empty byte buffer. But I couldn't swear that other part of the code don't break subtly with empty row keys.

        Show
        Sylvain Lebresne added a comment - It's because indexed value are used as row key (in the index). Previously, empty row key was actually allowed by the code (but imo it was more of a bug we were not aware of). The right fix may probably be to refuse empty value, though that could appear as a limitation. The other solution would be to create a special constructor for DK that allows an empty byte buffer. But I couldn't swear that other part of the code don't break subtly with empty row keys.

          People

          • Assignee:
            Yuki Morishita
            Reporter:
            Mariusz
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development