Cassandra
  1. Cassandra
  2. CASSANDRA-5225

Missing columns, errors when requesting specific columns from wide rows

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Fix Version/s: 1.2.2
    • Component/s: Core
    • Labels:
      None

      Description

      With Cassandra 1.2.1 (and probably 1.2.0), I'm seeing some problems with Thrift queries that request a set of specific column names when the row is very wide.

      To reproduce, I'm inserting 10 million columns into a single row and then randomly requesting three columns by name in a loop. It's common for only one or two of the three columns to be returned. I'm also seeing stack traces like the following in the Cassandra log:

      ERROR 13:12:01,017 Exception in thread Thread[ReadStage:76,5,main]
      java.lang.RuntimeException: org.apache.cassandra.io.sstable.CorruptSSTableException: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0 (/var/lib/cassandra/data/Keyspace1/CF1/Keyspace1-CF1-ib-5-Data.db, 14035168 bytes remaining)
      	at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1576)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	at java.lang.Thread.run(Thread.java:662)
      Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0 (/var/lib/cassandra/data/Keyspace1/CF1/Keyspace1-CF1-ib-5-Data.db, 14035168 bytes remaining)
      	at org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNamesIterator.java:69)
      	at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:81)
      	at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68)
      	at org.apache.cassandra.db.CollationController.collectTimeOrderedData(CollationController.java:133)
      	at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
      	at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1358)
      	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1215)
      	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1127)
      	at org.apache.cassandra.db.Table.getRow(Table.java:355)
      	at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:64)
      	at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052)
      	at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1572)
      	... 3 more
      

      This doesn't seem to happen when the row is smaller, so it might have something to do with incremental large row compaction.

      1. 5225.txt
        3 kB
        Sylvain Lebresne
      2. corrected-pycassa-repro.py
        1 kB
        Daniel Meyer
      3. pycassa-repro.py
        1 kB
        Tyler Hobbs

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Sylvain Lebresne
              Reporter:
              Tyler Hobbs
              Reviewer:
              Brandon Williams
              Tester:
              Daniel Meyer
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development