Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-7734

Schema pushes (seemingly) randomly not happening

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 2.0.11, 2.1.1
    • None
    • None
    • Normal

    Description

      We have been seeing problems since upgrade to 2.0.9 from 2.0.5.

      Basically after a while, new schema changes (we periodically add tables) start propagating very slowly to some nodes and fast to others. It looks from the logs and trace that in this case the "push" of the schema never happens (note a node has decided not to push to another node, it doesn't seem to start again) from the originating node to some of the other nodes. In this case though, we do see the other node end up pulling the schema some time later when it notices its schema is out of date.

      Here is code from 2.0.9 MigrationManager.announce

             for (InetAddress endpoint : Gossiper.instance.getLiveMembers())
              {
                  // only push schema to nodes with known and equal versions
                  if (!endpoint.equals(FBUtilities.getBroadcastAddress()) &&
                          MessagingService.instance().knowsVersion(endpoint) &&
                          MessagingService.instance().getRawVersion(endpoint) == MessagingService.current_version)
                      pushSchemaMutation(endpoint, schema);
              }
      

      and from 2.0.5

              for (InetAddress endpoint : Gossiper.instance.getLiveMembers())
              {
                  if (endpoint.equals(FBUtilities.getBroadcastAddress()))
                      continue; // we've dealt with localhost already
      
                  // don't send schema to the nodes with the versions older than current major
                  if (MessagingService.instance().getVersion(endpoint) < MessagingService.current_version)
                      continue;
      
                  pushSchemaMutation(endpoint, schema);
      	}
      

      the old getVersion() call would return MessagingService.current_version if the version was unknown, so the push would occur in this case. I don't have logging to prove this, but have strong suspicion that the version may end up null in some cases (which would have allowed schema propagation in 2.0.5, but not by somewhere after that and <= 2.0.9)

      Attachments

        1. 7734.txt
          4 kB
          Aleksey Yeschenko

        Issue Links

          Activity

            People

              aleksey Aleksey Yeschenko
              graham sanderson graham sanderson
              Aleksey Yeschenko
              Marcus Eriksson
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: