Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-7734

Schema pushes (seemingly) randomly not happening

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 2.0.11, 2.1.1
    • Component/s: None
    • Labels:
      None

      Description

      We have been seeing problems since upgrade to 2.0.9 from 2.0.5.

      Basically after a while, new schema changes (we periodically add tables) start propagating very slowly to some nodes and fast to others. It looks from the logs and trace that in this case the "push" of the schema never happens (note a node has decided not to push to another node, it doesn't seem to start again) from the originating node to some of the other nodes. In this case though, we do see the other node end up pulling the schema some time later when it notices its schema is out of date.

      Here is code from 2.0.9 MigrationManager.announce

             for (InetAddress endpoint : Gossiper.instance.getLiveMembers())
              {
                  // only push schema to nodes with known and equal versions
                  if (!endpoint.equals(FBUtilities.getBroadcastAddress()) &&
                          MessagingService.instance().knowsVersion(endpoint) &&
                          MessagingService.instance().getRawVersion(endpoint) == MessagingService.current_version)
                      pushSchemaMutation(endpoint, schema);
              }
      

      and from 2.0.5

              for (InetAddress endpoint : Gossiper.instance.getLiveMembers())
              {
                  if (endpoint.equals(FBUtilities.getBroadcastAddress()))
                      continue; // we've dealt with localhost already
      
                  // don't send schema to the nodes with the versions older than current major
                  if (MessagingService.instance().getVersion(endpoint) < MessagingService.current_version)
                      continue;
      
                  pushSchemaMutation(endpoint, schema);
      	}
      

      the old getVersion() call would return MessagingService.current_version if the version was unknown, so the push would occur in this case. I don't have logging to prove this, but have strong suspicion that the version may end up null in some cases (which would have allowed schema propagation in 2.0.5, but not by somewhere after that and <= 2.0.9)

        Attachments

        1. 7734.txt
          4 kB
          Aleksey Yeschenko

          Issue Links

            Activity

              People

              • Assignee:
                iamaleksey Aleksey Yeschenko
                Reporter:
                graham sanderson graham sanderson
                Reviewer:
                Marcus Eriksson
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: