Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-8643

merkle tree creation fails with NoSuchElementException

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Normal
    • Resolution: Unresolved
    • Fix Version/s: 2.1.x
    • Component/s: None
    • Labels:
    • Environment:

      We are running on a three node cluster with three in replication(C* 2.1.1). It uses a default C* installation and STCS.

    • Severity:
      Normal

      Description

        We have a problem that we encountered during testing over the weekend.
      During the tests we noticed that repairs started to fail. This error has occured on multiple non-coordinator nodes during repair. It also ran at least once without producing this error.

      We run repair -pr on all nodes on different days. CPU values were around 40% and disk was 50% full.

      From what I understand, the coordinator asked for merkle trees from the other two nodes. However one of the nodes fails to create his merkle tree.

      Unfortunately we do not have a way to reproduce this problem.

      The coordinator receives:

      2015-01-09T17:55:57.091+0100  INFO [RepairJobTask:4] RepairJob.java:145 [repair #59455950-9820-11e4-b5c1-7797064e1316] requesting merkle trees for censored (to [/xx.90, /xx.98, /xx.82])
      2015-01-09T17:55:58.516+0100  INFO [AntiEntropyStage:1] RepairSession.java:171 [repair #59455950-9820-11e4-b5c1-7797064e1316] Received merkle tree for censored from /xx.90
      2015-01-09T17:55:59.581+0100 ERROR [AntiEntropySessions:76] RepairSession.java:303 [repair #59455950-9820-11e4-b5c1-7797064e1316] session completed with the following error
      org.apache.cassandra.exceptions.RepairException: [repair #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
              at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) ~[apache-cassandra-2.1.1.jar:2.1.1]
              at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384) ~[apache-cassandra-2.1.1.jar:2.1.1]
              at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126) ~[apache-cassandra-2.1.1.jar:2.1.1]
              at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) ~[apache-cassandra-2.1.1.jar:2.1.1]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
              at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
      2015-01-09T17:55:59.582+0100 ERROR [AntiEntropySessions:76] CassandraDaemon.java:153 Exception in thread Thread[AntiEntropySessions:76,5,RMI Runtime]
      java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
              at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na]
              at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) ~[apache-cassandra-2.1.1.jar:2.1.1]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51]
              at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_51]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
             at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] Caused by: org.apache.cassandra.exceptions.RepairException: [repair #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
              at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) ~[apache-cassandra-2.1.1.jar:2.1.1]
              at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384) ~[apache-cassandra-2.1.1.jar:2.1.1]
              at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126) ~[apache-cassandra-2.1.1.jar:2.1.1]
              at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) ~[apache-cassandra-2.1.1.jar:2.1.1]
              ... 3 common frames omitted
      

      While one of the other nodes produces this error:

      2015-01-09T17:55:59.574+0100 ERROR [ValidationExecutor:16] Validator.java:232 Failed creating a merkle tree for [repair #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, (-6476420463551243930,-6471459119674373580]], /xx.82 (see log for details)
      2015-01-09T17:55:59.578+0100 ERROR [ValidationExecutor:16] CassandraDaemon.java:153 Exception in thread Thread[ValidationExecutor:16,1,main]
      java.util.NoSuchElementException: null
              at com.google.common.collect.AbstractIterator.next(AbstractIterator.java:154) ~[guava-16.0.jar:na]
              at org.apache.cassandra.repair.Validator.add(Validator.java:137) ~[apache-cassandra-2.1.1.jar:2.1.1]
              at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:930) ~[apache-cassandra-2.1.1.jar:2.1.1]
              at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:97) ~[apache-cassandra-2.1.1.jar:2.1.1]
              at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:557) ~[apache-cassandra-2.1.1.jar:2.1.1]
              at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_51]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
              at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
      

        Attachments

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

              Dates

              • Created:
                Updated:

                Issue deployment