[CASSANDRA-5456] Large number of bootstrapping nodes cause gossip to stop working - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 1.1.11, 1.2.5
Component/s: None
Labels:
None

Severity:
Normal

Description

Long running section of code in PendingRangeCalculatorService is synchronized on bootstrapTokens. This causes gossip to stop working as it waits for the same lock when a large number of nodes (hundreds in our case) are bootstrapping. Consequently, the whole cluster becomes non-functional.

I experimented with the following change in PendingRangeCalculatorService.java and it resolved the problem in our case. Prior code had synchronized around the for loop.

synchronized(bootstrapTokens) {
bootstrapTokens = new LinkedHashMap<Token, InetAddress>(bootstrapTokens);
}

for (Map.Entry<Token, InetAddress> entry : bootstrapTokens.entrySet())
{
InetAddress endpoint = entry.getValue();

allLeftMetadata.updateNormalToken(entry.getKey(), endpoint);
for (Range<Token> range : strategy.getAddressRanges(allLeftMetadata).get(endpoint))
pendingRanges.put(range, endpoint);
allLeftMetadata.removeEndpoint(endpoint);
}

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PendingRangeCalculatorService.patch
11/Apr/13 20:07
1 kB
Oleg Kibirev

Activity

People

Assignee:: Oleg Kibirev

Reporter:: Oleg Kibirev

Authors:: Oleg Kibirev

Reviewers:: Brandon Williams

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 11/Apr/13 18:44

Updated:: 16/Apr/19 09:32

Resolved:: 12/Apr/13 17:13