[ROCKETMQ-184] It takes too long(3-33 seconds) to switch to read from slave when master crashes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Later
Affects Version/s: None
Fix Version/s: 4.2.0
Component/s: rocketmq-client, rocketmq-remoting
Labels:
None

Description

When master crashes, no notifier callback is triggered to pull message again.

Instead, it relies on the scan service to trigger timeout and then re pull.

But the pulling command has 30 seconds timeout, and after timeout, pulling operation will be scheduled after 3 seconds.

So it takes 3 to 33 seconds to switch to slave, which is too long and can be optimized.

The root cause is the below repull cost too long to be triggered when master crashes

            @Override
            public void onException(Throwable e) {
                if (!pullRequest.getMessageQueue().getTopic().startsWith(MixAll.RETRY_GROUP_TOPIC_PREFIX)) {
                    log.warn("execute the pull request exception", e);
                }

                DefaultMQPushConsumerImpl.this.executePullRequestLater(pullRequest, PULL_TIME_DELAY_MILLS_WHEN_EXCEPTION);
            }

Attachments

Issue Links

links to

GitHub Pull Request #95

Activity

People

Assignee:: Xiaorui Wang

Reporter:: Jaskey Lam

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 19/Apr/17 08:32

Updated:: 13/Dec/17 12:34

Resolved:: 13/Dec/17 12:34