Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-6648

Race condition during node bootstrapping

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Urgent
    • Resolution: Fixed
    • 1.2.15, 2.0.5
    • None
    • Critical

    Description

      When bootstrapping a new node, data is "missing" as if the new node didn't actually bootstrap, which I tracked down to the following scenario:

      1) New node joins token ring and waits for schema to be settled before actually bootstrapping.
      2) The schema scheck somewhat passes and it starts bootstrapping.
      3) Bootstrapping doesn't find the ks/cf that should have received from the other node.
      4) Queries at this point cause NPEs, until when later they "recover" but data is missed.

      The problem seems to be caused by a race condition between the migration manager and the bootstrapper, with the former running after the latter.
      I think this is supposed to protect against such scenarios:

                  while (!MigrationManager.isReadyForBootstrap())
                  {
                      setMode(Mode.JOINING, "waiting for schema information to complete", true);
                      Uninterruptibles.sleepUninterruptibly(1, TimeUnit.SECONDS);
                  }
      

      But MigrationManager.isReadyForBootstrap() implementation is quite fragile and doesn't take into account "slow" schema propagation.

      Attachments

        1. 6648-v2.txt
          4 kB
          Brandon Williams
        2. 6648-v3.txt
          6 kB
          Brandon Williams
        3. 6648-v3-1.2.txt
          5 kB
          Brandon Williams
        4. CASSANDRA-6648.patch
          3 kB
          Sergio Bossa

        Issue Links

          Activity

            People

              sbtourist Sergio Bossa
              sbtourist Sergio Bossa
              Sergio Bossa
              Tom Hobbs
              Ryan McGuire Ryan McGuire
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: