Uploaded image for project: 'Apache RocketMQ'
  1. Apache RocketMQ
  2. ROCKETMQ-184

It takes too long(3-33 seconds) to switch to read from slave when master crashes

    XMLWordPrintableJSON

Details

    Description

      When master crashes, no notifier callback is triggered to pull message again.

      Instead, it relies on the scan service to trigger timeout and then re pull.

      But the pulling command has 30 seconds timeout, and after timeout, pulling operation will be scheduled after 3 seconds.

      So it takes 3 to 33 seconds to switch to slave, which is too long and can be optimized.

      The root cause is the below repull cost too long to be triggered when master crashes

                  @Override
                  public void onException(Throwable e) {
                      if (!pullRequest.getMessageQueue().getTopic().startsWith(MixAll.RETRY_GROUP_TOPIC_PREFIX)) {
                          log.warn("execute the pull request exception", e);
                      }
      
                      DefaultMQPushConsumerImpl.this.executePullRequestLater(pullRequest, PULL_TIME_DELAY_MILLS_WHEN_EXCEPTION);
                  }
      
      

      Attachments

        Activity

          People

            vintagewang Xiaorui Wang
            Jaskey Jaskey Lam
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: