Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-5456

Large number of bootstrapping nodes cause gossip to stop working

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 1.1.11, 1.2.5
    • Component/s: None
    • Labels:
      None

      Description

      Long running section of code in PendingRangeCalculatorService is synchronized on bootstrapTokens. This causes gossip to stop working as it waits for the same lock when a large number of nodes (hundreds in our case) are bootstrapping. Consequently, the whole cluster becomes non-functional.

      I experimented with the following change in PendingRangeCalculatorService.java and it resolved the problem in our case. Prior code had synchronized around the for loop.

      synchronized(bootstrapTokens) {
      bootstrapTokens = new LinkedHashMap<Token, InetAddress>(bootstrapTokens);
      }

      for (Map.Entry<Token, InetAddress> entry : bootstrapTokens.entrySet())
      {
      InetAddress endpoint = entry.getValue();

      allLeftMetadata.updateNormalToken(entry.getKey(), endpoint);
      for (Range<Token> range : strategy.getAddressRanges(allLeftMetadata).get(endpoint))
      pendingRanges.put(range, endpoint);
      allLeftMetadata.removeEndpoint(endpoint);
      }

        Activity

        Hide
        brandon.williams Brandon Williams added a comment -

        Committed, thanks!

        Show
        brandon.williams Brandon Williams added a comment - Committed, thanks!
        Hide
        okibirev Oleg Kibirev added a comment -

        Making a copy of bootstrapTokens before a time consuming loop rather than holding a synchronized lock for the whole duration

        Show
        okibirev Oleg Kibirev added a comment - Making a copy of bootstrapTokens before a time consuming loop rather than holding a synchronized lock for the whole duration
        Hide
        okibirev Oleg Kibirev added a comment -

        Making a copy of bootstrapTokens rather than holding a lock on the same for entire time consuming loop.

        Show
        okibirev Oleg Kibirev added a comment - Making a copy of bootstrapTokens rather than holding a lock on the same for entire time consuming loop.

          People

          • Assignee:
            okibirev Oleg Kibirev
            Reporter:
            okibirev Oleg Kibirev
            Reviewer:
            Brandon Williams
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development