[COUCHDB-3376] Fix mem3_shards under load - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

There were two issues with mem3_shards that were fixed while I've been testing the PSE code.

The first issue was found by jaydoane where a database can have its shards inserted into the cache after its been deleted. This can happen if a client does a rapid CREATE/DELETE/GET cycle on a database. The fix for this is to track the changes feed update sequence from the changes feed listener and only insert shard maps that come from a client that has read as recent of an update_seq as mem3_shards.

The second issue found during heavy benchmarking was that large shard maps (in the Q>=128 range) can quite easily cause mem3_shards to backup when there's a thundering herd attempting to open the database. There's no coordination among workers trying to add a shard map to the cache so if a bunch of independent clients all send the shard map at once (say, at the beginning of a benchmark) then mem3_shards can get overwhelmed. The fix for this was two fold. First, rather than send the shard map directly to mem3_shards, we copy it into a spawned process and when/if mem3_shards wants to write it, it tells this writer process to do its business. The second optimization for this change is to create an ets table to track these processes. Then independent clients can check if a shard map is already enroute to mem3_shards by using ets:insert_new and canceling their writer if that returns false.

PR incoming.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Paul Joseph Davis

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 14/Apr/17 22:20

Updated:: 25/Apr/17 21:11

Resolved:: 25/Apr/17 21:10