Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-18736

Streaming exception race creates corrupt transaction log files that prevent restart




      On restart, Cassandra logs this message and terminates.

      ERROR 2023-07-17T17:17:22,931 [main] org.apache.cassandra.db.lifecycle.LogTransaction:561 - Unexpected disk state: failed to read transaction log [nb_txn_stream_39d5f6b0-fb81-11ed-8f46-e97b3f61511e.log in /datadir1/keyspace/table-c9527530a0d611e8813f034699fc9043]
      Files and contents follow:
                      ***This record should have been the last one in all replicas

      The root cause is a race during streaming exception handling.

      Although concurrent modification of to the LogTransaction was added for CASSANDRA-16225, there is nothing to prevent usage after the transaction is completed (committed/aborted) once it has been processed by TransactionTidier (after the last reference is released). Before the transaction is tidied, the LogFile keeps a list of records that are checked for completion before adding new entries. In TransactionTidier LogFile.records are cleared as no longer needed, however the LogTransaction/LogFile is still accessible to the stream.

      The changes in CASSANDRA-17273 added a parallel set of onDiskRecords that could be used to reliably recreate the transaction log at any new datadirs the same as the existing
      datadirs - regardless of the effect of LogTransaction.untrackNew/LogFile.remove

      If a streaming exception causes the LogTransaction to be aborted and tidied just before SimpleSSTableMultiWriter calls trackNew to add a new sstable. At the time of the call, the LogFile will not contain any LogReplicas,
      LogFile.records will be empty, and LogFile.onDiskRecords will contain an ABORT.

      When LogTransaction.trackNew/LogFile.add is called, the check for completed transaction fails as records is empty, there are no replicas on the datadir, so maybeCreateReplicas creates a new txnlog file replica containing ABORT, then
      appends an ADD record.

      The LogFile has already been tidied after the abort so the txnlog file is not removed and sits on disk until a restart, causing the faiulre.

      There is a related exception caused with a different interleaving of aborts, after an sstable is added, however this is just a nuisance in the logs as the LogRelica is already created with an ADD record first.

      java.lang.AssertionError: [ADD:[/datadir1/keyspace/table/nb-23314378-big-,0,8][1869379820]] is not tracked by 55be35b0-35d1-11ee-865d-8b1e3c48ca06
              at org.apache.cassandra.db.lifecycle.LogFile.remove(LogFile.java:388)
              at org.apache.cassandra.db.lifecycle.LogTransaction.untrackNew(LogTransaction.java:158)
              at org.apache.cassandra.db.lifecycle.LifecycleTransaction.untrackNew(LifecycleTransaction.java:577)
              at org.apache.cassandra.db.streaming.CassandraStreamReceiver$1.untrackNew(CassandraStreamReceiver.java:149)
              at org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.abort(SimpleSSTableMultiWriter.java:95)
              at org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter.abort(RangeAwareSSTableWriter.java:191)
              at org.apache.cassandra.db.streaming.CassandraCompressedStreamReader.read(CassandraCompressedStreamReader.java:115)
              at org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:85)
              at org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:53)
              at org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:38)
              at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:53)
              at org.apache.cassandra.streaming.async.StreamingInboundHandler$StreamDeserializingTask.run(StreamingInboundHandler.java:172)
              at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
              at java.base/java.lang.Thread.run(Thread.java:829)


        1. 5.0-ci_summary.html
          42 kB
          Jon Meredith
        2. 4.0-ci_summary.html
          16 kB
          Jon Meredith
        3. 4.1-ci_summary.html
          8 kB
          Jon Meredith
        4. trunk-ci_summary.html
          3 kB
          Jon Meredith



            jonmeredith Jon Meredith
            jonmeredith Jon Meredith
            Jon Meredith
            Caleb Rackliffe, David Capwell
            1 Vote for this issue
            6 Start watching this issue



              Time Tracking

                Original Estimate - Not Specified
                Not Specified
                Remaining Estimate - 0h
                Time Spent - 2.5h