Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-11824

If repair fails no way to run repair again

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 2.1.15, 2.2.7, 3.0.7, 3.7
    • None
    • Normal

    Description

      I have a test that disables gossip and runs repair at the same time.

      WARN [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,775 StorageService.java:384 - Stopping gossip by operator request
      INFO [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,775 Gossiper.java:1463 - Announcing shutdown
      INFO [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,776 StorageService.java:1999 - Node /172.31.31.1 state jump to shutdown
      INFO [HANDSHAKE-/172.31.17.32] 2016-05-17 16:57:21,895 OutboundTcpConnection.java:514 - Handshaking version with /172.31.17.32
      INFO [HANDSHAKE-/172.31.24.76] 2016-05-17 16:57:21,895 OutboundTcpConnection.java:514 - Handshaking version with /172.31.24.76
      INFO [Thread-25] 2016-05-17 16:57:21,925 RepairRunnable.java:125 - Starting repair command #1, repairing keyspace keyspace1 with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
      INFO [Thread-26] 2016-05-17 16:57:21,953 RepairRunnable.java:125 - Starting repair command #2, repairing keyspace stresscql with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
      INFO [Thread-27] 2016-05-17 16:57:21,967 RepairRunnable.java:125 - Starting repair command #3, repairing keyspace system_traces with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 2)

      This ends up failing:

      16:54:44.844 INFO serverGroup-node-1-574 - STDOUT: [2016-05-17 16:57:21,933] Starting repair command #1, repairing keyspace keyspace1 with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
      [2016-05-17 16:57:21,943] Did not get positive replies from all endpoints. List of failed endpoint(s): [172.31.24.76, 172.31.17.32]
      [2016-05-17 16:57:21,945] null

      Subsequent calls to repair with all nodes up still fails:

      ERROR [ValidationExecutor:3] 2016-05-17 18:58:53,460 CompactionManager.java:1193 - Cannot start multiple repair sessions over the same sstables
      ERROR [ValidationExecutor:3] 2016-05-17 18:58:53,460 Validator.java:261 - Failed creating a merkle tree for [repair #66425f10-1c61-11e6-83b2-0b1fff7a067d on keyspace1/standard1,

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            marcuse Marcus Eriksson Assign to me
            tjake T Jake Luciani
            Marcus Eriksson
            Paulo Motta (Deprecated)
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment