Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14375

Digest mismatch Exception when sending raw hints in cluster

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Normal
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: Consistency/Hints
    • Labels:
      None
    • Environment:

      CentOS 7.3

    • Severity:
      Normal

      Description

      We have 14 nodes cluster where we seen hints file getting corrupted and resulting in the following error

      ERROR [HintsDispatcher:1] 2018-04-06 16:26:44,423 CassandraDaemon.java:228 - Exception in thread Thread[HintsDispatcher:1,1,main]
       org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch exception
       at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:298) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:263) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:169) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:128) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:113) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:94) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:278) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:260) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:238) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:217) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_141]
       at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_141]
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_141]
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_141]
       at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) [apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_141]
       Caused by: java.io.IOException: Digest mismatch exception
       at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNextInternal(HintsReader.java:315) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:289) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       ... 16 common frames omitted
      

      Notes on cluster and investigation done so far
      1. Cassandra used here is built locally from 3.11.1 branch along with following patch from issue: CASSANDRA-14080
      https://github.com/apache/cassandra/commit/68079e4b2ed4e58dbede70af45414b3d4214e195
      2. The bootstrap of 14 nodes happens in the following way:

      • Out of 14 nodes only 3 nodes are picked as seed nodes.
      • Only 1 out 3 seed nodes is started and schema is created if it was not created previously.
      • Post this, rest of nodes are bootstrapped.
      • In failure scenario, only 5 out of 14 succesfully formed the cassandra cluster. The failed nodes include two seed nodes.
        3. We confirmed the following patch from issue: CASSANDRA-13696 has been applied. From confirmed from Jay Zhuang that this is different issue from what was previously fixed.
        "this should be a different issue, as HintsDispatcher.java:128 sends hints with {{buffer}}s, this patch is only to fix the digest mismatch for HintsDispatcher.java:129, which sends hints one by one."
        4. Application uses java driver with quoram setting for cassandra
        5. We saw this issue on 7 node cluster too (different from 14 node cluster)
        6. We are able to workaround by running nodetool truncatehints on failed nodes and restarting cassandra.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              vinegh Vineet Ghatge
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: