Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-19260

org.apache.cassandra.tcm.ClusterMetadataService#commit does not catch up when rejected

    XMLWordPrintableJSON

Details

    Description

      This was found in the cep-15-accord branch (CASSANDRA-18804).  The test that found this was a simple benchmark test.

      1) deploy a 6 node cluster
      2) create a table
      3) in parallel launch many accord transactions

      When accord gets a transaction it needs to make sure the table is “managed” by accord which uses TCM for this bookkeeping, this is just a List<TableId> in ClusterMetadata.  We found that we detect that the table isn’t managed so we try to add it, we get a reject and the TCM epoch has not moved forward!

      Debugging this it looks like org.apache.cassandra.tcm.RemoteProcessor#commit is the root cause as it only seems to try to catch up if there is a messaging error and not a TCM rejection!  Given that the caller to TCM is not able to find the epoch to “wait” on I feel that this is a TCM issue as TCM normally tries to make sure success/rejects are blocking, but in this one case it appears not to be so

      Attachments

        1. ci_summary.html
          7 kB
          Alex Petrov

        Issue Links

          Activity

            People

              ifesdjeen Alex Petrov
              dcapwell David Capwell
              Alex Petrov
              Alex Petrov, Sam Tunnicliffe
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: