Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.1.0-incubating
    • Fix Version/s: 4.0.0-incubating
    • Component/s: rocketmq-store
    • Labels:
      None

      Description

      Consume queue file may hang and stop further deleting under concurrent scenario.

      If the consume queue file is concurrently held by another thread, mappedFile.destroy(interval) will only set firstShutdownTimestamp to current timestamp and available false, then return false directly.
      Check MappedFileQueue.java#deleteExpiredFileByOffset,

      if (destroy && mappedFile.destroy(1000 * 60)) {
      files.add(mappedFile);
      deleteCount++;
      } else {
      break;
      }
      the current round of deletion will be stopped and current mapped file becomes handed.

      Next time trying to delete mapped file of this mapped file queue, MappedFileQueue.java#deleteExpiredFileByOffset(offset, unitSize) line No.390
      MappedFile mappedFile = (MappedFile) mfs[i];
      SelectMappedBufferResult result = mappedFile.selectMappedBuffer(this.mappedFileSize - unitSize);
      will always return null as the mapped file is no longer available and cannot hold. Current implementation will only log a warn and exit.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user lizhanhui opened a pull request:

          https://github.com/apache/incubator-rocketmq/pull/39

          ROCKETMQ-45Delete hanged consume queue files

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/lizhanhui/incubator-rocketmq ROCKETMQ-45

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/incubator-rocketmq/pull/39.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #39


          commit 4bb113ce87c6ff52fc9af7ddf2a389e2a45853b6
          Author: Zhanhui Li <lizhanhui@apache.org>
          Date: 2017-01-12T07:02:26Z

          Delete hanged consume queue files


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user lizhanhui opened a pull request: https://github.com/apache/incubator-rocketmq/pull/39 ROCKETMQ-45 Delete hanged consume queue files You can merge this pull request into a Git repository by running: $ git pull https://github.com/lizhanhui/incubator-rocketmq ROCKETMQ-45 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-rocketmq/pull/39.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #39 commit 4bb113ce87c6ff52fc9af7ddf2a389e2a45853b6 Author: Zhanhui Li <lizhanhui@apache.org> Date: 2017-01-12T07:02:26Z Delete hanged consume queue files
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zhouxinyu commented on the issue:

          https://github.com/apache/incubator-rocketmq/pull/39

          Hi, @lizhanhui , what's the appearance/effect of this BUG?

          Let's start from the next round.

          IMO, it's ok when ConsumeQueue-MappedFile hold failed, this file will be released by the last holder, while the empty mapped file will retain in MappedFileQueue, but the min consume queue offset will be corrected.

          It seems that the only problem is the hanged consume queue only can be deleted when restart the broker.

          Show
          githubbot ASF GitHub Bot added a comment - Github user zhouxinyu commented on the issue: https://github.com/apache/incubator-rocketmq/pull/39 Hi, @lizhanhui , what's the appearance/effect of this BUG? Let's start from the next round. IMO, it's ok when ConsumeQueue-MappedFile hold failed, this file will be released by the last holder, while the empty mapped file will retain in MappedFileQueue, but the min consume queue offset will be corrected. It seems that the only problem is the hanged consume queue only can be deleted when restart the broker.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zhouxinyu commented on a diff in the pull request:

          https://github.com/apache/incubator-rocketmq/pull/39#discussion_r95955059

          — Diff: store/src/main/java/org/apache/rocketmq/store/MappedFileQueue.java —
          @@ -397,11 +397,21 @@ public int deleteExpiredFileByOffset(long offset, int unitSize)

          { log.info("physic min offset " + offset + ", logics in current mappedFile max offset " + maxOffsetInLogicQueue + ", delete it"); }

          + } else if (!mappedFile.isAvailable()) { // Handle hanged file.
          — End diff –

          I think it's better to add `RefCount` to the if condition.

          Show
          githubbot ASF GitHub Bot added a comment - Github user zhouxinyu commented on a diff in the pull request: https://github.com/apache/incubator-rocketmq/pull/39#discussion_r95955059 — Diff: store/src/main/java/org/apache/rocketmq/store/MappedFileQueue.java — @@ -397,11 +397,21 @@ public int deleteExpiredFileByOffset(long offset, int unitSize) { log.info("physic min offset " + offset + ", logics in current mappedFile max offset " + maxOffsetInLogicQueue + ", delete it"); } + } else if (!mappedFile.isAvailable()) { // Handle hanged file. — End diff – I think it's better to add `RefCount` to the if condition.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user lizhanhui commented on a diff in the pull request:

          https://github.com/apache/incubator-rocketmq/pull/39#discussion_r95959190

          — Diff: store/src/main/java/org/apache/rocketmq/store/MappedFileQueue.java —
          @@ -397,11 +397,21 @@ public int deleteExpiredFileByOffset(long offset, int unitSize)

          { log.info("physic min offset " + offset + ", logics in current mappedFile max offset " + maxOffsetInLogicQueue + ", delete it"); }

          + } else if (!mappedFile.isAvailable()) { // Handle hanged file.
          — End diff –

          Why? I do not see the rationale of awaiting `RefCount` to be 0 after commit log has been deleted after `intervalForcibly` period of time.

          Show
          githubbot ASF GitHub Bot added a comment - Github user lizhanhui commented on a diff in the pull request: https://github.com/apache/incubator-rocketmq/pull/39#discussion_r95959190 — Diff: store/src/main/java/org/apache/rocketmq/store/MappedFileQueue.java — @@ -397,11 +397,21 @@ public int deleteExpiredFileByOffset(long offset, int unitSize) { log.info("physic min offset " + offset + ", logics in current mappedFile max offset " + maxOffsetInLogicQueue + ", delete it"); } + } else if (!mappedFile.isAvailable()) { // Handle hanged file. — End diff – Why? I do not see the rationale of awaiting `RefCount` to be 0 after commit log has been deleted after `intervalForcibly` period of time.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user lizhanhui commented on the issue:

          https://github.com/apache/incubator-rocketmq/pull/39

          > what's the appearance/effect of this BUG?

          Reported bug description is consume queue files cannot be automatically deleted and took up large portion of disk.

          > IMO, it's ok when ConsumeQueue-MappedFile hold failed, this file will be released by the last holder, while the empty mapped file will retain in MappedFileQueue, but the min consume queue offset will be corrected.

          ConsumeQueue#correctMinOffset will not work as expected if consume queue file hanged. Double check the source code to figure out.

          > It seems that the only problem is the hanged consume queue only can be deleted when restart the broker.

          Not only this one. As correcting min offset is also affected, consumer client probably experiences long term of "commit log being deleted." before consuming progress catches up. There should be additional potential issues.

          Show
          githubbot ASF GitHub Bot added a comment - Github user lizhanhui commented on the issue: https://github.com/apache/incubator-rocketmq/pull/39 > what's the appearance/effect of this BUG? Reported bug description is consume queue files cannot be automatically deleted and took up large portion of disk. > IMO, it's ok when ConsumeQueue-MappedFile hold failed, this file will be released by the last holder, while the empty mapped file will retain in MappedFileQueue, but the min consume queue offset will be corrected. ConsumeQueue#correctMinOffset will not work as expected if consume queue file hanged. Double check the source code to figure out. > It seems that the only problem is the hanged consume queue only can be deleted when restart the broker. Not only this one. As correcting min offset is also affected, consumer client probably experiences long term of "commit log being deleted." before consuming progress catches up. There should be additional potential issues.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zhouxinyu commented on a diff in the pull request:

          https://github.com/apache/incubator-rocketmq/pull/39#discussion_r95964644

          — Diff: store/src/main/java/org/apache/rocketmq/store/MappedFileQueue.java —
          @@ -397,11 +397,21 @@ public int deleteExpiredFileByOffset(long offset, int unitSize)

          { log.info("physic min offset " + offset + ", logics in current mappedFile max offset " + maxOffsetInLogicQueue + ", delete it"); }

          + } else if (!mappedFile.isAvailable()) { // Handle hanged file.
          — End diff –

          When MappedFile has been `shutdown`, no more new holder appear, so the `RefCount` will to be 0 soon, so I think maybe it's more graceful ..

          Show
          githubbot ASF GitHub Bot added a comment - Github user zhouxinyu commented on a diff in the pull request: https://github.com/apache/incubator-rocketmq/pull/39#discussion_r95964644 — Diff: store/src/main/java/org/apache/rocketmq/store/MappedFileQueue.java — @@ -397,11 +397,21 @@ public int deleteExpiredFileByOffset(long offset, int unitSize) { log.info("physic min offset " + offset + ", logics in current mappedFile max offset " + maxOffsetInLogicQueue + ", delete it"); } + } else if (!mappedFile.isAvailable()) { // Handle hanged file. — End diff – When MappedFile has been `shutdown`, no more new holder appear, so the `RefCount` will to be 0 soon, so I think maybe it's more graceful ..
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zhouxinyu commented on the issue:

          https://github.com/apache/incubator-rocketmq/pull/39

          I checked ConsumeQueue#correctMinOffset, and you are right this method won't work as expected in this situation.

          Please @vintagewang @vongosling help review this PR.

          Show
          githubbot ASF GitHub Bot added a comment - Github user zhouxinyu commented on the issue: https://github.com/apache/incubator-rocketmq/pull/39 I checked ConsumeQueue#correctMinOffset, and you are right this method won't work as expected in this situation. Please @vintagewang @vongosling help review this PR.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zhouxinyu commented on a diff in the pull request:

          https://github.com/apache/incubator-rocketmq/pull/39#discussion_r95979009

          — Diff: store/src/main/java/org/apache/rocketmq/store/MappedFileQueue.java —
          @@ -397,11 +397,21 @@ public int deleteExpiredFileByOffset(long offset, int unitSize)

          { log.info("physic min offset " + offset + ", logics in current mappedFile max offset " + maxOffsetInLogicQueue + ", delete it"); }

          + } else if (!mappedFile.isAvailable()) { // Handle hanged file.
          — End diff –

          Another thing, can we just mark `destroy=true` in `else if` section and next `if` will destroy it ?

          Show
          githubbot ASF GitHub Bot added a comment - Github user zhouxinyu commented on a diff in the pull request: https://github.com/apache/incubator-rocketmq/pull/39#discussion_r95979009 — Diff: store/src/main/java/org/apache/rocketmq/store/MappedFileQueue.java — @@ -397,11 +397,21 @@ public int deleteExpiredFileByOffset(long offset, int unitSize) { log.info("physic min offset " + offset + ", logics in current mappedFile max offset " + maxOffsetInLogicQueue + ", delete it"); } + } else if (!mappedFile.isAvailable()) { // Handle hanged file. — End diff – Another thing, can we just mark `destroy=true` in `else if` section and next `if` will destroy it ?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zhouxinyu commented on the issue:

          https://github.com/apache/incubator-rocketmq/pull/39

          @lizhanhui Also please add some unit tests -..

          Show
          githubbot ASF GitHub Bot added a comment - Github user zhouxinyu commented on the issue: https://github.com/apache/incubator-rocketmq/pull/39 @lizhanhui Also please add some unit tests - ..
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user lizhanhui commented on a diff in the pull request:

          https://github.com/apache/incubator-rocketmq/pull/39#discussion_r97212223

          — Diff: store/src/main/java/org/apache/rocketmq/store/MappedFileQueue.java —
          @@ -397,11 +397,21 @@ public int deleteExpiredFileByOffset(long offset, int unitSize)

          { log.info("physic min offset " + offset + ", logics in current mappedFile max offset " + maxOffsetInLogicQueue + ", delete it"); }

          + } else if (!mappedFile.isAvailable()) { // Handle hanged file.
          — End diff –

          > Another thing, can we just mark destroy=true in else if section and next if will destroy it ?

          Yep, I think we can do this.

          Show
          githubbot ASF GitHub Bot added a comment - Github user lizhanhui commented on a diff in the pull request: https://github.com/apache/incubator-rocketmq/pull/39#discussion_r97212223 — Diff: store/src/main/java/org/apache/rocketmq/store/MappedFileQueue.java — @@ -397,11 +397,21 @@ public int deleteExpiredFileByOffset(long offset, int unitSize) { log.info("physic min offset " + offset + ", logics in current mappedFile max offset " + maxOffsetInLogicQueue + ", delete it"); } + } else if (!mappedFile.isAvailable()) { // Handle hanged file. — End diff – > Another thing, can we just mark destroy=true in else if section and next if will destroy it ? Yep, I think we can do this.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user coveralls commented on the issue:

          https://github.com/apache/incubator-rocketmq/pull/39

          [![Coverage Status](https://coveralls.io/builds/9784804/badge)](https://coveralls.io/builds/9784804)

          Coverage increased (+0.1%) to 25.854% when pulling *b8e53c990e0b467c2a2b766137f7b32f3b5b6f04 on lizhanhui:ROCKETMQ-45* into *b29c318cdde225ef3a33a73e939e49e087766a28 on apache:master*.

          Show
          githubbot ASF GitHub Bot added a comment - Github user coveralls commented on the issue: https://github.com/apache/incubator-rocketmq/pull/39 [! [Coverage Status] ( https://coveralls.io/builds/9784804/badge)](https://coveralls.io/builds/9784804 ) Coverage increased (+0.1%) to 25.854% when pulling * b8e53c990e0b467c2a2b766137f7b32f3b5b6f04 on lizhanhui: ROCKETMQ-45 * into * b29c318cdde225ef3a33a73e939e49e087766a28 on apache:master *.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user coveralls commented on the issue:

          https://github.com/apache/incubator-rocketmq/pull/39

          [![Coverage Status](https://coveralls.io/builds/9784804/badge)](https://coveralls.io/builds/9784804)

          Coverage increased (+0.1%) to 25.854% when pulling *b8e53c990e0b467c2a2b766137f7b32f3b5b6f04 on lizhanhui:ROCKETMQ-45* into *b29c318cdde225ef3a33a73e939e49e087766a28 on apache:master*.

          Show
          githubbot ASF GitHub Bot added a comment - Github user coveralls commented on the issue: https://github.com/apache/incubator-rocketmq/pull/39 [! [Coverage Status] ( https://coveralls.io/builds/9784804/badge)](https://coveralls.io/builds/9784804 ) Coverage increased (+0.1%) to 25.854% when pulling * b8e53c990e0b467c2a2b766137f7b32f3b5b6f04 on lizhanhui: ROCKETMQ-45 * into * b29c318cdde225ef3a33a73e939e49e087766a28 on apache:master *.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user coveralls commented on the issue:

          https://github.com/apache/incubator-rocketmq/pull/39

          [![Coverage Status](https://coveralls.io/builds/9784804/badge)](https://coveralls.io/builds/9784804)

          Coverage increased (+0.1%) to 25.854% when pulling *b8e53c990e0b467c2a2b766137f7b32f3b5b6f04 on lizhanhui:ROCKETMQ-45* into *b29c318cdde225ef3a33a73e939e49e087766a28 on apache:master*.

          Show
          githubbot ASF GitHub Bot added a comment - Github user coveralls commented on the issue: https://github.com/apache/incubator-rocketmq/pull/39 [! [Coverage Status] ( https://coveralls.io/builds/9784804/badge)](https://coveralls.io/builds/9784804 ) Coverage increased (+0.1%) to 25.854% when pulling * b8e53c990e0b467c2a2b766137f7b32f3b5b6f04 on lizhanhui: ROCKETMQ-45 * into * b29c318cdde225ef3a33a73e939e49e087766a28 on apache:master *.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user coveralls commented on the issue:

          https://github.com/apache/incubator-rocketmq/pull/39

          [![Coverage Status](https://coveralls.io/builds/9784819/badge)](https://coveralls.io/builds/9784819)

          Coverage increased (+0.1%) to 25.814% when pulling *57111a02721ac8d70de3bb6210db4cf13f4714a9 on lizhanhui:ROCKETMQ-45* into *b29c318cdde225ef3a33a73e939e49e087766a28 on apache:master*.

          Show
          githubbot ASF GitHub Bot added a comment - Github user coveralls commented on the issue: https://github.com/apache/incubator-rocketmq/pull/39 [! [Coverage Status] ( https://coveralls.io/builds/9784819/badge)](https://coveralls.io/builds/9784819 ) Coverage increased (+0.1%) to 25.814% when pulling * 57111a02721ac8d70de3bb6210db4cf13f4714a9 on lizhanhui: ROCKETMQ-45 * into * b29c318cdde225ef3a33a73e939e49e087766a28 on apache:master *.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user coveralls commented on the issue:

          https://github.com/apache/incubator-rocketmq/pull/39

          [![Coverage Status](https://coveralls.io/builds/9784819/badge)](https://coveralls.io/builds/9784819)

          Coverage increased (+0.1%) to 25.814% when pulling *57111a02721ac8d70de3bb6210db4cf13f4714a9 on lizhanhui:ROCKETMQ-45* into *b29c318cdde225ef3a33a73e939e49e087766a28 on apache:master*.

          Show
          githubbot ASF GitHub Bot added a comment - Github user coveralls commented on the issue: https://github.com/apache/incubator-rocketmq/pull/39 [! [Coverage Status] ( https://coveralls.io/builds/9784819/badge)](https://coveralls.io/builds/9784819 ) Coverage increased (+0.1%) to 25.814% when pulling * 57111a02721ac8d70de3bb6210db4cf13f4714a9 on lizhanhui: ROCKETMQ-45 * into * b29c318cdde225ef3a33a73e939e49e087766a28 on apache:master *.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user coveralls commented on the issue:

          https://github.com/apache/incubator-rocketmq/pull/39

          [![Coverage Status](https://coveralls.io/builds/9784819/badge)](https://coveralls.io/builds/9784819)

          Coverage increased (+0.1%) to 25.814% when pulling *57111a02721ac8d70de3bb6210db4cf13f4714a9 on lizhanhui:ROCKETMQ-45* into *b29c318cdde225ef3a33a73e939e49e087766a28 on apache:master*.

          Show
          githubbot ASF GitHub Bot added a comment - Github user coveralls commented on the issue: https://github.com/apache/incubator-rocketmq/pull/39 [! [Coverage Status] ( https://coveralls.io/builds/9784819/badge)](https://coveralls.io/builds/9784819 ) Coverage increased (+0.1%) to 25.814% when pulling * 57111a02721ac8d70de3bb6210db4cf13f4714a9 on lizhanhui: ROCKETMQ-45 * into * b29c318cdde225ef3a33a73e939e49e087766a28 on apache:master *.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 11ff542c4382faf3bb7a74a80329f1fcab6a07a6 in incubator-rocketmq's branch refs/heads/master from Zhanhui Li
          [ https://git-wip-us.apache.org/repos/asf?p=incubator-rocketmq.git;h=11ff542 ]

          ROCKETMQ-45Delete hanged consume queue files

          Show
          jira-bot ASF subversion and git services added a comment - Commit 11ff542c4382faf3bb7a74a80329f1fcab6a07a6 in incubator-rocketmq's branch refs/heads/master from Zhanhui Li [ https://git-wip-us.apache.org/repos/asf?p=incubator-rocketmq.git;h=11ff542 ] ROCKETMQ-45 Delete hanged consume queue files
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user lizhanhui closed the pull request at:

          https://github.com/apache/incubator-rocketmq/pull/39

          Show
          githubbot ASF GitHub Bot added a comment - Github user lizhanhui closed the pull request at: https://github.com/apache/incubator-rocketmq/pull/39
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user vongosling commented on the issue:

          https://github.com/apache/incubator-rocketmq/pull/39

          Usually, we need 3 committers to review PR. We'd better not merge the PR if the opinion fails to unite

          Show
          githubbot ASF GitHub Bot added a comment - Github user vongosling commented on the issue: https://github.com/apache/incubator-rocketmq/pull/39 Usually, we need 3 committers to review PR. We'd better not merge the PR if the opinion fails to unite
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user lizhanhui commented on the issue:

          https://github.com/apache/incubator-rocketmq/pull/39

          > We'd better not merge the PR if the opinion fails to unite
          Agree. But this is a minor change to fix obvious corner case mis-handling. Proposing review points are carefully considered and mostly accepted. Anyway, Let's wait for three review OK next time before merging.

          Show
          githubbot ASF GitHub Bot added a comment - Github user lizhanhui commented on the issue: https://github.com/apache/incubator-rocketmq/pull/39 > We'd better not merge the PR if the opinion fails to unite Agree. But this is a minor change to fix obvious corner case mis-handling. Proposing review points are carefully considered and mostly accepted. Anyway, Let's wait for three review OK next time before merging.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 11ff542c4382faf3bb7a74a80329f1fcab6a07a6 in incubator-rocketmq's branch refs/heads/ROCKETMQ-57 from Zhanhui Li
          [ https://git-wip-us.apache.org/repos/asf?p=incubator-rocketmq.git;h=11ff542 ]

          ROCKETMQ-45Delete hanged consume queue files

          Show
          jira-bot ASF subversion and git services added a comment - Commit 11ff542c4382faf3bb7a74a80329f1fcab6a07a6 in incubator-rocketmq's branch refs/heads/ ROCKETMQ-57 from Zhanhui Li [ https://git-wip-us.apache.org/repos/asf?p=incubator-rocketmq.git;h=11ff542 ] ROCKETMQ-45 Delete hanged consume queue files
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 11ff542c4382faf3bb7a74a80329f1fcab6a07a6 in incubator-rocketmq's branch refs/heads/ROCKETMQ-54 from Zhanhui Li
          [ https://git-wip-us.apache.org/repos/asf?p=incubator-rocketmq.git;h=11ff542 ]

          ROCKETMQ-45Delete hanged consume queue files

          Show
          jira-bot ASF subversion and git services added a comment - Commit 11ff542c4382faf3bb7a74a80329f1fcab6a07a6 in incubator-rocketmq's branch refs/heads/ ROCKETMQ-54 from Zhanhui Li [ https://git-wip-us.apache.org/repos/asf?p=incubator-rocketmq.git;h=11ff542 ] ROCKETMQ-45 Delete hanged consume queue files

            People

            • Assignee:
              Yukon yukon
              Reporter:
              lizhanhui Zhanhui Li
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development