Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-14554

Streaming needs to synchronise access to LifecycleTransaction

    XMLWordPrintableJSON

Details

    • Correctness - API / Semantic Implementation
    • Normal
    • Challenging
    • Adhoc Test

    Description

      When LifecycleTransaction is used in a multi-threaded context, we encounter this exception -

      java.util.ConcurrentModificationException: null
      at java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719)
      at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742)
      at java.lang.Iterable.forEach(Iterable.java:74)
      at org.apache.cassandra.db.lifecycle.LogReplicaSet.maybeCreateReplica(LogReplicaSet.java:78)
      at org.apache.cassandra.db.lifecycle.LogFile.makeRecord(LogFile.java:320)
      at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285)
      at org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136)
      at org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529)

      During streaming we create a reference to a LifeCycleTransaction and share it between threads -

      https://github.com/apache/cassandra/blob/5cc68a87359dd02412bdb70a52dfcd718d44a5ba/src/java/org/apache/cassandra/db/streaming/CassandraStreamReader.java#L156

      This is used in a multi-threaded context insideĀ CassandraIncomingFile which is anĀ IncomingStreamMessage. This is being deserialized in parallel.

      LifecycleTransaction is not meant to be used in a multi-threaded context and this leads to streaming failures due to object sharing. On trunk, this object is shared across all threads that transfer sstables in parallel for the given TableId in a StreamSession. There are two options to solve this - make LifecycleTransaction and the associated objects thread safe, scope the transaction to a single CassandraIncomingFile. The consequences of the latter option is that if we experience streaming failure we may have redundant SSTables on disk. This is ok as compaction should clean this up. A third option is we synchronize access in the streaming infrastructure.

      Attachments

        Issue Links

          Activity

            People

              stefania Stefania Alborghetti
              djoshi Dinesh Joshi
              Stefania Alborghetti
              Benedict Elliott Smith, Robert Stupp
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: