[IGNITE-8006] Starting multiple caches slows down exchange process on joining node - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.8
Component/s: None
Labels:
None

Description

In some cases when we starts multiple caches (over 2K caches), we can get a stop on exchange when new node joining to the cluster.

Coordinator-node wait to receive a single message from all other nodes, but last node (which want to joining to the cluster) stopped on starting caches:

Stack trace
 at java.lang.Thread.dumpStack(Thread.java:1329)
 at org.apache.ignite.internal.processors.cache.GridCacheProcessor.startCache(GridCacheProcessor.java:1159)
 at org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheStart(GridCacheProcessor.java:1900)
 at org.apache.ignite.internal.processors.cache.GridCacheProcessor.startCachesOnLocalJoin(GridCacheProcessor.java:1764)
 at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:740)
 at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:622)
 at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2329)
 at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
 at java.lang.Thread.run(Thread.java:745)

It blocks cluster exchange process until all caches started on the last node.

We should start caches in parallel threads or exclude the action from exchange init process.

Attachments

Issue Links

causes

IGNITE-9729 Ability to start GridQueryProcessor in parallel

Open

is blocked by

IGNITE-5795 Binary metadata is not registered during start of cache

Resolved

is related to

IGNITE-10228 Start multiple caches in parallel may lead to the fact that some of the caches won't be registered.

Resolved

links to

GitHub Pull Request #4752

TC All

UpSource

(1 links to)

Activity

People

Assignee:: Anton Kalashnikov

Reporter:: Vladislav Pyatkov

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 21/Mar/18 14:00

Updated:: 12/Nov/18 18:25

Resolved:: 22/Oct/18 13:33