Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-10068

Batchlog replay fails with exception after a node is decommissioned

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Normal
    • Resolution: Cannot Reproduce
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Severity:
      Normal

      Description

      This issue is reproducible through a Jepsen test of materialized views that crashes and decommissions nodes throughout the test.

      At the conclusion of the test, a batchlog replay is initiated through nodetool and hits the following assertion due to a missing host ID: https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197

      A nodetool status on the node with failed batchlog replay shows the following entry for the decommissioned node:
      DN 10.0.0.5 ? 256 ? null rack1

      On the unaffected nodes, there is no entry for the decommissioned node as expected.

      There are occasional hits of the same assertions for logs in other nodes; it looks like the issue might occasionally resolve itself, but one node seems to have the errant null entry indefinitely.

      In logs for the nodes, this possibly unrelated exception also appears:
      java.lang.RuntimeException: Trying to get the view natural endpoint on a non-data replica
      at org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91) ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT]

      I have a running cluster with the issue on my machine; it is also repeatable.

      Nothing stands out in the logs of the decommissioned node (n4) for me. The logs of each node in the cluster are attached.

        Attachments

        1. n5.log
          330 kB
          Joel Knighton
        2. n4.log
          282 kB
          Joel Knighton
        3. n3.log
          906 kB
          Joel Knighton
        4. n2.log
          762 kB
          Joel Knighton
        5. n1.log
          785 kB
          Joel Knighton

          Activity

            People

            • Assignee:
              blambov Branimir Lambov
              Reporter:
              jkni Joel Knighton
              Authors:
              Branimir Lambov
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: