Bookkeeper
  1. Bookkeeper
  2. BOOKKEEPER-539

ClientNotSubscribedException & doesn't receive enough messages in TestThrottlingDelivery#testServerSideThrottle

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2.0
    • Component/s: hedwig-server
    • Labels:
      None

      Description

      ClientNotSubscribedException & doesn't receive enough messages failure in TestThrottlingDelivery#testServerSideThrottle.

      -------------------------------------------------------------------------------
      Test set: org.apache.hedwig.server.delivery.TestThrottlingDelivery
      -------------------------------------------------------------------------------
      Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 18.544 sec <<< FAILURE!
      testServerSideThrottle[1](org.apache.hedwig.server.delivery.TestThrottlingDelivery)  Time elapsed: 6.776 sec  <<< FAILURE!   junit.framework.AssertionFailedError: Should be expected messages with only 6 expected:<6> but was:<2>
              at junit.framework.Assert.fail(Assert.java:47)
              at junit.framework.Assert.failNotEquals(Assert.java:283)
              at junit.framework.Assert.assertEquals(Assert.java:64)
              at junit.framework.Assert.assertEquals(Assert.java:195)
              at org.apache.hedwig.server.delivery.TestThrottlingDelivery.throttleX(TestThrottlingDelivery.java:151)
              at org.apache.hedwig.server.delivery.TestThrottlingDelivery.testServerSideThrottle(TestThrottlingDelivery.java:216)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:601)
              at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
              at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
              at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
              at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
              at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)
      
      -------------------------------------------------------------------------------
      Test set: org.apache.hedwig.server.delivery.TestThrottlingDelivery
      -------------------------------------------------------------------------------
      Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 18.294 sec <<< FAILURE!
      testServerSideThrottle[1](org.apache.hedwig.server.delivery.TestThrottlingDelivery)  Time elapsed: 6.763 sec  <<< ERROR!
      org.apache.hedwig.exceptions.PubSubException$ClientNotSubscribedException: Client is not yet subscribed to Topic:            testServerSideThrottleWithHigherValue, SubscriberId: serverThrottleSub
              at org.apache.hedwig.client.netty.impl.multiplex.MultiplexHChannelManager.startDelivery(MultiplexHChannelManager.    java:221)
              at org.apache.hedwig.client.netty.impl.multiplex.MultiplexHChannelManager.startDelivery(MultiplexHChannelManager.    java:199)
              at org.apache.hedwig.client.netty.HedwigSubscriber.startDelivery(HedwigSubscriber.java:358)
              at org.apache.hedwig.server.delivery.TestThrottlingDelivery.throttleX(TestThrottlingDelivery.java:113)
              at org.apache.hedwig.server.delivery.TestThrottlingDelivery.testServerSideThrottle(TestThrottlingDelivery.java:226)          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:601)
              at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
              at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
              at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
              at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
              at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)
      

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in bookkeeper-trunk2 #53 (See https://builds.apache.org/job/bookkeeper-trunk2/53/)
          BOOKKEEPER-539: ClientNotSubscribedException & doesn't receive enough messages in TestThrottlingDelivery#testServerSideThrottle (sijie) (Revision 1431869)

          Result = SUCCESS
          sijie :
          Files :

          • /zookeeper/bookkeeper/trunk/CHANGES.txt
          • /zookeeper/bookkeeper/trunk/hedwig-server/src/main/java/org/apache/hedwig/server/delivery/ChannelEndPoint.java
          • /zookeeper/bookkeeper/trunk/hedwig-server/src/main/java/org/apache/hedwig/server/delivery/FIFODeliveryManager.java
          • /zookeeper/bookkeeper/trunk/hedwig-server/src/test/java/org/apache/hedwig/server/delivery/TestFIFODeliveryManager.java
          Show
          Hudson added a comment - Integrated in bookkeeper-trunk2 #53 (See https://builds.apache.org/job/bookkeeper-trunk2/53/ ) BOOKKEEPER-539 : ClientNotSubscribedException & doesn't receive enough messages in TestThrottlingDelivery#testServerSideThrottle (sijie) (Revision 1431869) Result = SUCCESS sijie : Files : /zookeeper/bookkeeper/trunk/CHANGES.txt /zookeeper/bookkeeper/trunk/hedwig-server/src/main/java/org/apache/hedwig/server/delivery/ChannelEndPoint.java /zookeeper/bookkeeper/trunk/hedwig-server/src/main/java/org/apache/hedwig/server/delivery/FIFODeliveryManager.java /zookeeper/bookkeeper/trunk/hedwig-server/src/test/java/org/apache/hedwig/server/delivery/TestFIFODeliveryManager.java
          Hide
          Sijie Guo added a comment -

          committed as r1431869. thanks Ivan for adding test case, thanks Jiannan for verification.

          Show
          Sijie Guo added a comment - committed as r1431869. thanks Ivan for adding test case, thanks Jiannan for verification.
          Hide
          Jiannan Wang added a comment -

          patch lgtm +1
          I run with the patch for 16 hours and there is not error reported, so this patch do fix the problem.

          Show
          Jiannan Wang added a comment - patch lgtm +1 I run with the patch for 16 hours and there is not error reported, so this patch do fix the problem.
          Hide
          Hadoop QA added a comment -

          Testing JIRA BOOKKEEPER-539

          Patch 0001-BOOKKEEPER-539-ClientNotSubscribedException-doesn-t-.patch downloaded at Thu Jan 10 18:31:22 UTC 2013

          ----------------------------

          +1 PATCH_APPLIES
          +1 CLEAN
          +1 RAW_PATCH_ANALYSIS
          . +1 the patch does not introduce any @author tags
          . +1 the patch does not introduce any tabs
          . +1 the patch does not introduce any trailing spaces
          . +1 the patch does not introduce any line longer than 120
          . +1 the patch does adds/modifies 1 testcase(s)
          +1 RAT
          . +1 the patch does not seem to introduce new RAT warnings
          +1 JAVADOC
          . +1 the patch does not seem to introduce new Javadoc warnings
          +1 COMPILE
          . +1 HEAD compiles
          . +1 patch compiles
          . +1 the patch does not seem to introduce new javac warnings
          +1 FINDBUGS
          . +1 the patch does not seem to introduce new Findbugs warnings
          +1 TESTS
          . Tests run: 787
          +1 DISTRO
          . +1 distro tarball builds with the patch

          ----------------------------
          +1 Overall result, good!, no -1s

          The full output of the test-patch run is available at

          . https://builds.apache.org/job/bookkeeper-trunk-precommit-build/237/

          Show
          Hadoop QA added a comment - Testing JIRA BOOKKEEPER-539 Patch 0001-BOOKKEEPER-539-ClientNotSubscribedException-doesn-t-.patch downloaded at Thu Jan 10 18:31:22 UTC 2013 ---------------------------- +1 PATCH_APPLIES +1 CLEAN +1 RAW_PATCH_ANALYSIS . +1 the patch does not introduce any @author tags . +1 the patch does not introduce any tabs . +1 the patch does not introduce any trailing spaces . +1 the patch does not introduce any line longer than 120 . +1 the patch does adds/modifies 1 testcase(s) +1 RAT . +1 the patch does not seem to introduce new RAT warnings +1 JAVADOC . +1 the patch does not seem to introduce new Javadoc warnings +1 COMPILE . +1 HEAD compiles . +1 patch compiles . +1 the patch does not seem to introduce new javac warnings +1 FINDBUGS . +1 the patch does not seem to introduce new Findbugs warnings +1 TESTS . Tests run: 787 +1 DISTRO . +1 distro tarball builds with the patch ---------------------------- +1 Overall result, good!, no -1s The full output of the test-patch run is available at . https://builds.apache.org/job/bookkeeper-trunk-precommit-build/237/
          Hide
          Sijie Guo added a comment -

          the test case looks good. +1. thanks so much for providing the test case, Ivan Kelly

          Show
          Sijie Guo added a comment - the test case looks good. +1. thanks so much for providing the test case, Ivan Kelly
          Hide
          Ivan Kelly added a comment -

          Added a test case, which hits it reliably without the fix. Regarding the fix, it looks good, so +1 from me. Once either Sijie Guo or Jiannan Wang +1 the test to verify I understood the issue correctly, this can be pushed into trunk.

          Show
          Ivan Kelly added a comment - Added a test case, which hits it reliably without the fix. Regarding the fix, it looks good, so +1 from me. Once either Sijie Guo or Jiannan Wang +1 the test to verify I understood the issue correctly, this can be pushed into trunk.
          Hide
          Hadoop QA added a comment -

          Testing JIRA BOOKKEEPER-539

          Patch BOOKKEEPER-539.diff downloaded at Thu Jan 10 10:14:26 UTC 2013

          ----------------------------

          +1 PATCH_APPLIES
          +1 CLEAN
          -1 RAW_PATCH_ANALYSIS
          . +1 the patch does not introduce any @author tags
          . +1 the patch does not introduce any tabs
          . +1 the patch does not introduce any trailing spaces
          . +1 the patch does not introduce any line longer than 120
          . -1 the patch does not add/modify any testcase
          +1 RAT
          . +1 the patch does not seem to introduce new RAT warnings
          +1 JAVADOC
          . +1 the patch does not seem to introduce new Javadoc warnings
          +1 COMPILE
          . +1 HEAD compiles
          . +1 patch compiles
          . +1 the patch does not seem to introduce new javac warnings
          +1 FINDBUGS
          . +1 the patch does not seem to introduce new Findbugs warnings
          +1 TESTS
          . Tests run: 784
          +1 DISTRO
          . +1 distro tarball builds with the patch

          ----------------------------
          -1 Overall result, please check the reported -1(s)

          The full output of the test-patch run is available at

          . https://builds.apache.org/job/bookkeeper-trunk-precommit-build/234/

          Show
          Hadoop QA added a comment - Testing JIRA BOOKKEEPER-539 Patch BOOKKEEPER-539.diff downloaded at Thu Jan 10 10:14:26 UTC 2013 ---------------------------- +1 PATCH_APPLIES +1 CLEAN -1 RAW_PATCH_ANALYSIS . +1 the patch does not introduce any @author tags . +1 the patch does not introduce any tabs . +1 the patch does not introduce any trailing spaces . +1 the patch does not introduce any line longer than 120 . -1 the patch does not add/modify any testcase +1 RAT . +1 the patch does not seem to introduce new RAT warnings +1 JAVADOC . +1 the patch does not seem to introduce new Javadoc warnings +1 COMPILE . +1 HEAD compiles . +1 patch compiles . +1 the patch does not seem to introduce new javac warnings +1 FINDBUGS . +1 the patch does not seem to introduce new Findbugs warnings +1 TESTS . Tests run: 784 +1 DISTRO . +1 distro tarball builds with the patch ---------------------------- -1 Overall result, please check the reported -1(s) The full output of the test-patch run is available at . https://builds.apache.org/job/bookkeeper-trunk-precommit-build/234/
          Hide
          Sijie Guo added a comment -

          attach a patch to address the issue according to the comment. but I don't have a testcase for it, because it is quite difficult to simulate such race condition.

          Show
          Sijie Guo added a comment - attach a patch to address the issue according to the comment. but I don't have a testcase for it, because it is quite difficult to simulate such race condition.
          Hide
          Sijie Guo added a comment -

          this problem is similar as I described in previous comment.

          the sequence caused the problem is :

          1) subscriber #subscribe. FIFODeliveryManager#startServingSubscriber.
          2) #startServingSubscriber: callback to send subscriber response back to client.
          3) delivery manager thread is switched to execute other threads.
          4) client received #subscribe response.
          5) client #closeSubscription.
          6) #closeSubscription would do FIFODeliveryManager#stopServingSubscriber, which removes subscriber from subscriberStates mapping. (the removal isn't executed in delivery manager thread, but in netty worker thread). so actually #stopServingSubscriber do nothing, since subscriberStates is empty now, since DeliveryManagerThread doesn't get the chance to continue execution.
          7) delivery manager thread continues execution and put the active subscriber into the subscriberStates map.
          8) when next subscribe request came in, it would replace the active subscriber with a new subscriber object and sent SUBSCRIPTION_FORCE_CLOSED event to its channel. since it was multiplexed channel, so the old subscriber event killed the new one, which caused ClientNotSubscribedException.

          several places would fix here:

          1) subscribe response only sent after the active subscriber is put in subscriberStates map.
          2) removal should be executed in DeliveryRequest, running in delivery manager thread.
          3) when replace a active subscriber, do checking whether the previous subscriber and the new subscriber are in same channel. if so, we don't sent FORCE_CLOSED event. because this case, might be a retry subscribe request for a previous timeout subscribe request in a multiplexed channel.

          Show
          Sijie Guo added a comment - this problem is similar as I described in previous comment. the sequence caused the problem is : 1) subscriber #subscribe. FIFODeliveryManager#startServingSubscriber. 2) #startServingSubscriber: callback to send subscriber response back to client. 3) delivery manager thread is switched to execute other threads. 4) client received #subscribe response. 5) client #closeSubscription. 6) #closeSubscription would do FIFODeliveryManager#stopServingSubscriber, which removes subscriber from subscriberStates mapping. (the removal isn't executed in delivery manager thread, but in netty worker thread). so actually #stopServingSubscriber do nothing, since subscriberStates is empty now, since DeliveryManagerThread doesn't get the chance to continue execution. 7) delivery manager thread continues execution and put the active subscriber into the subscriberStates map. 8) when next subscribe request came in, it would replace the active subscriber with a new subscriber object and sent SUBSCRIPTION_FORCE_CLOSED event to its channel. since it was multiplexed channel, so the old subscriber event killed the new one, which caused ClientNotSubscribedException. several places would fix here: 1) subscribe response only sent after the active subscriber is put in subscriberStates map. 2) removal should be executed in DeliveryRequest, running in delivery manager thread. 3) when replace a active subscriber, do checking whether the previous subscriber and the new subscriber are in same channel. if so, we don't sent FORCE_CLOSED event. because this case, might be a retry subscribe request for a previous timeout subscribe request in a multiplexed channel.
          Hide
          Sijie Guo added a comment -

          Attach another log about ClientNotSubscribedException problem, found in Jiannan Wang's environment.

          Show
          Sijie Guo added a comment - Attach another log about ClientNotSubscribedException problem, found in Jiannan Wang 's environment.
          Hide
          Sijie Guo added a comment -

          I would prefer that Jiannan Wang could verify it to ensure that BOOKKEEPER-529 did resolve this issue, since this issue found when he ran the tests. Actually, I could not reproduce that issue on my mac.

          Show
          Sijie Guo added a comment - I would prefer that Jiannan Wang could verify it to ensure that BOOKKEEPER-529 did resolve this issue, since this issue found when he ran the tests. Actually, I could not reproduce that issue on my mac.
          Hide
          Uma Maheswara Rao G added a comment -

          If that is the case, we can close this as duplicate with BK-529 right?

          Show
          Uma Maheswara Rao G added a comment - If that is the case, we can close this as duplicate with BK-529 right?
          Hide
          Sijie Guo added a comment -

          I think the patch in BOOKKEEPER-529 already fixed this issue. so I would move it from 4.3.0 to 4.2.0.

          Show
          Sijie Guo added a comment - I think the patch in BOOKKEEPER-529 already fixed this issue. so I would move it from 4.3.0 to 4.2.0.
          Hide
          Sijie Guo added a comment -

          the test case failed in Multiplex hedwig client. it was a race condition in FIFODeliveryManager#stopServingSubscriber.

          it could be reproduced in following sequence.

          1) subscriber #subscribe. FIFODeliveryManager#startServingSubscriber.
          2) subscriber #closesub. FIFODeliveryManager#stopServingSubscriber (it is important that currently #stopServingSubscriber doesn't remove active subscriber from the mapping)
          3) subscriber #subscribe. #startServingSubscriber replaced the active subscriber created at 2) with a newly active subscriber. after the replacement, it inserted another #stopServingSubscriber operatio in the queue.
          4) the #stopServingSubscriber operation enqueued in 3) executed and delivered a FORCED_CLOSE_SUBSCRIPTION event to client,which would clear the subscription state established in 3) at the client side.

          so

          a) if the clear operation executed before startDelivery, #startDelivery would throw ClientNotSubscribedException.
          b) if the clear operation executed after startDelivery, it might cause 'doesn't receive enough messages'.

          beside this critical issue, I found that there is a race condition over connected variable in ActiveSubscriber from the attached log. 'connected' variable isn't volatile, but it was used between DeliveryManager thread and ReadAheadCacheThread. some message would still be delivered to client side even someone close the subscription, because two threads see different value of 'connected' in different CPU cores.

          Show
          Sijie Guo added a comment - the test case failed in Multiplex hedwig client. it was a race condition in FIFODeliveryManager#stopServingSubscriber. it could be reproduced in following sequence. 1) subscriber #subscribe. FIFODeliveryManager#startServingSubscriber. 2) subscriber #closesub. FIFODeliveryManager#stopServingSubscriber (it is important that currently #stopServingSubscriber doesn't remove active subscriber from the mapping) 3) subscriber #subscribe. #startServingSubscriber replaced the active subscriber created at 2) with a newly active subscriber. after the replacement, it inserted another #stopServingSubscriber operatio in the queue. 4) the #stopServingSubscriber operation enqueued in 3) executed and delivered a FORCED_CLOSE_SUBSCRIPTION event to client,which would clear the subscription state established in 3) at the client side. so a) if the clear operation executed before startDelivery, #startDelivery would throw ClientNotSubscribedException. b) if the clear operation executed after startDelivery, it might cause 'doesn't receive enough messages'. beside this critical issue, I found that there is a race condition over connected variable in ActiveSubscriber from the attached log. 'connected' variable isn't volatile, but it was used between DeliveryManager thread and ReadAheadCacheThread. some message would still be delivered to client side even someone close the subscription, because two threads see different value of 'connected' in different CPU cores.
          Hide
          Sijie Guo added a comment -

          here are the detail logs of the failure test case. thanks Jiannan Wang for reporting.

          Show
          Sijie Guo added a comment - here are the detail logs of the failure test case. thanks Jiannan Wang for reporting.

            People

            • Assignee:
              Sijie Guo
              Reporter:
              Sijie Guo
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development