Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-10288

Incremental repair can hang if replica aren't all up (was: Inconsistent behaviours on repair when a node in RF is missing)

    XMLWordPrintableJSON

Details

    • Normal

    Description

      So with a cluster of 3 nodes and a RF=3 for my keyspace, I tried to repair my data with a single node down. I got 3 different behaviours with different C* versions. With:

      cassandra-2.1: it fails saying a node is down. (acceptable)
      cassandra-2.2: it hangs forever (???)
      cassandra-3.0: it completes successfully

      What is the correct behaviour of this repair use case? Obviously, cassandra-2.2 has to be fixed, too.

      Here are the result logs when testing:
      cassandra-2.1

      ccmlib.node.NodetoolError: Nodetool command '/home/aboudreault/git/cstar/cassandra/bin/nodetool -h localhost -p 7100 repair test test' failed; exit status: 2; stdout: [2015-09-08 16:32:24,488] Starting repair command #3, repairing 3 ranges for keyspace test (parallelism=SEQUENTIAL, full=true)
      [2015-09-08 16:32:24,492] Repair session b69b5990-5668-11e5-b4ae-b3ffbc47f04c for range (3074457345618258602,-9223372036854775808] failed with error java.io.IOException: Cannot proceed on repair because a neighbor (/127.0.0.2) is dead: session failed
      [2015-09-08 16:32:24,493] Repair session b69b80a0-5668-11e5-b4ae-b3ffbc47f04c for range (-9223372036854775808,-3074457345618258603] failed with error java.io.IOException: Cannot proceed on repair because a neighbor (/127.0.0.2) is dead: session failed
      [2015-09-08 16:32:24,494] Repair session b69ba7b0-5668-11e5-b4ae-b3ffbc47f04c for range (-3074457345618258603,3074457345618258602] failed with error java.io.IOException: Cannot proceed on repair because a neighbor (/127.0.0.2) is dead: session failed
      [2015-09-08 16:32:24,494] Repair command #3 finished
      ; stderr: error: nodetool failed, check server logs
      -- StackTrace --
      java.lang.RuntimeException: nodetool failed, check server logs
              at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:291)
              at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:203)
      
      

      cassandra-2.2:

      just hangs .... waited more than 10 minutes.
      

      cassandra-3.0:

      $ ccm node1 nodetool repair test test
      
      [2015-09-08 16:39:40,139] Starting repair command #1, repairing keyspace test with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [test], dataCenters: [], hosts: [], # of ranges: 2)
      [2015-09-08 16:39:40,241] Repair session ba4a1440-5669-11e5-bc8e-b3ffbc47f04c for range [(3074457345618258602,-9223372036854775808], (-9223372036854775808,3074457345618258602]] finished (progress: 80%)
      [2015-09-08 16:39:40,267] Repair completed successfully
      [2015-09-08 16:39:40,270] Repair command #1 finished in 0 seconds
      

      Attachments

        1. repait_test.sh
          1 kB
          Alan Boudreault

        Activity

          People

            yukim Yuki Morishita
            aboudreault Alan Boudreault
            Yuki Morishita
            Marcus Eriksson
            Alan Boudreault Alan Boudreault
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: