Uploaded image for project: 'Qpid Proton'
  1. Qpid Proton
  2. PROTON-961

messenger doesn't handle received multi-frame transfers

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.1
    • Fix Version/s: 0.10
    • Component/s: proton-c
    • Labels:
      None

      Description

      See QPID-6651 for a reproducer.

        Issue Links

          Activity

          Hide
          gsim Gordon Sim added a comment -

          The pni_pump_in function in messenger.c proceeds to process incoming deliveries even if they are partial. There is a check for pn_delivery_partial at the start, but I'm not quite sure what it is trying to test there.

          The following change seems to fix the problem with large incoming messages and doesn't break any of the tests. Is it possible the original check was just written incorrectly?

          --- a/proton-c/src/messenger/messenger.c
          +++ b/proton-c/src/messenger/messenger.c
          @@ -987,7 +987,7 @@ static void pn_condition_report(const char *pfx, pn_condition_t *condition)
           int pni_pump_in(pn_messenger_t *messenger, const char *address, pn_link_t *receiver)
           {
             pn_delivery_t *d = pn_link_current(receiver);
          -  if (!pn_delivery_readable(d) && !pn_delivery_partial(d)) {
          +  if (!pn_delivery_readable(d) || pn_delivery_partial(d)) {
               return 0;
             }
          
          Show
          gsim Gordon Sim added a comment - The pni_pump_in function in messenger.c proceeds to process incoming deliveries even if they are partial. There is a check for pn_delivery_partial at the start, but I'm not quite sure what it is trying to test there. The following change seems to fix the problem with large incoming messages and doesn't break any of the tests. Is it possible the original check was just written incorrectly? --- a/proton-c/src/messenger/messenger.c +++ b/proton-c/src/messenger/messenger.c @@ -987,7 +987,7 @@ static void pn_condition_report(const char *pfx, pn_condition_t *condition) int pni_pump_in(pn_messenger_t *messenger, const char *address, pn_link_t *receiver) { pn_delivery_t *d = pn_link_current(receiver); - if (!pn_delivery_readable(d) && !pn_delivery_partial(d)) { + if (!pn_delivery_readable(d) || pn_delivery_partial(d)) { return 0; }
          Hide
          astitcher Andrew Stitcher added a comment -

          Is there a test for large messages? I'm guessing there is not otherwise it would fail.

          If this is so then this change would be fixing an issue (just not one that currently has a test).

          So I'd strongly suggest adding a new test that fails if you check this change in.

          One of the major purposes of the the tests is to ensure changes do not regress, so if the tests still pass after a change then you should be able to assume that the change is good.

          Of course the flip side of this is that we need to add failing tests for all the bugs we fix too (as in failing before the fixing change), else we don't get to find regressions.

          Show
          astitcher Andrew Stitcher added a comment - Is there a test for large messages? I'm guessing there is not otherwise it would fail. If this is so then this change would be fixing an issue (just not one that currently has a test). So I'd strongly suggest adding a new test that fails if you check this change in. One of the major purposes of the the tests is to ensure changes do not regress, so if the tests still pass after a change then you should be able to assume that the change is good. Of course the flip side of this is that we need to add failing tests for all the bugs we fix too (as in failing before the fixing change), else we don't get to find regressions.
          Hide
          kgiusti Ken Giusti added a comment -

          From my understanding of the code, I'd think your change is the correct fix.

          There are tests for receiving fragmented messages and large messages in the engine unit tests, but not for messenger.

          Show
          kgiusti Ken Giusti added a comment - From my understanding of the code, I'd think your change is the correct fix. There are tests for receiving fragmented messages and large messages in the engine unit tests, but not for messenger.
          Hide
          kgiusti Ken Giusti added a comment -

          Let me retract that - this isn't the correct fix.

          I tried hacking a copy of messenger to set the max-frame on its connections to 128 bytes. A few of the unit tests start failing with that setting, so I applied Gordon's fix above. His fix prevents a few of the failures, but proton_tests.messenger.MessengerTest.testSendReceive1M fails.

          Turns out even with that patch Messenger will hang for messages >= 1meg. This is caused by the incoming-window closing on the receiver. Since the transfers are partial, messenger never calls pn_link_advance or pn_link_recv, which replenishes the incoming window.

          Show
          kgiusti Ken Giusti added a comment - Let me retract that - this isn't the correct fix. I tried hacking a copy of messenger to set the max-frame on its connections to 128 bytes. A few of the unit tests start failing with that setting, so I applied Gordon's fix above. His fix prevents a few of the failures, but proton_tests.messenger.MessengerTest.testSendReceive1M fails. Turns out even with that patch Messenger will hang for messages >= 1meg. This is caused by the incoming-window closing on the receiver. Since the transfers are partial, messenger never calls pn_link_advance or pn_link_recv, which replenishes the incoming window.
          Hide
          rhs Rafael H. Schloming added a comment -

          I think the fix may be correct, but incomplete. I haven't looked at this code in a while, but it looks to me like the original code is wrong, and with the fix it will behave better in at least some situations. It sounds like the situation ken is talking about is simply not dealt with in the messenger code currently, i.e. it doesn't attempt to stream large messages into memory, hence they can block things up if they are larger than whatever the engine is willing to buffer.

          Show
          rhs Rafael H. Schloming added a comment - I think the fix may be correct, but incomplete. I haven't looked at this code in a while, but it looks to me like the original code is wrong, and with the fix it will behave better in at least some situations. It sounds like the situation ken is talking about is simply not dealt with in the messenger code currently, i.e. it doesn't attempt to stream large messages into memory, hence they can block things up if they are larger than whatever the engine is willing to buffer.
          Hide
          gemmellr Robbie Gemmell added a comment - - edited

          I'm sure it probably doesnt change the effect, but still worth noting that max frame size MUST be 512 bytes or higher.

          I don't think the problem you saw necessarily points to an issue with the fix Gordon sugested, but instead to an issue with the engine in general...

          The problem presumably happens at 1MB because the session defaults to a 1MB 'incoming bytes capacity'. If a local max-frame-size is set, this capacity is used to calculate the incoming window size as the number of max-sized frames that could be accepted at the time without breaching the 'capacity'. If a local max-frame-size is not specified (default), then a fixed window size of 2^31-1 is 'calculated' each time.

          Proton-j and by the looks of it proton-c both recalculate their session incoming window whenever they are issuing a flow, which amongst other cases they will look to do during a process of the transport if the session incoming window hits 0. The issue is presumably that if the incoming window hits 0 (because a local max-frame-size was set, and the resulting window has now been used) and the session has also received its capacity of incoming bytes at the same time, then the re-cacluated window will still be 0, something that happens if you send messages over <session-capacity> in size.

          In short, if the local max-frame-size is being set, then the incoming session window behaviour seems basically broken for messages over <session-capacity>. If so, this likely hasnt been noticed for the most part because few are setting max-frame-size.

          EDIT: fixed remote -> local for frame size usage.

          Show
          gemmellr Robbie Gemmell added a comment - - edited I'm sure it probably doesnt change the effect, but still worth noting that max frame size MUST be 512 bytes or higher. I don't think the problem you saw necessarily points to an issue with the fix Gordon sugested, but instead to an issue with the engine in general... The problem presumably happens at 1MB because the session defaults to a 1MB 'incoming bytes capacity'. If a local max-frame-size is set, this capacity is used to calculate the incoming window size as the number of max-sized frames that could be accepted at the time without breaching the 'capacity'. If a local max-frame-size is not specified (default), then a fixed window size of 2^31-1 is 'calculated' each time. Proton-j and by the looks of it proton-c both recalculate their session incoming window whenever they are issuing a flow, which amongst other cases they will look to do during a process of the transport if the session incoming window hits 0. The issue is presumably that if the incoming window hits 0 (because a local max-frame-size was set, and the resulting window has now been used) and the session has also received its capacity of incoming bytes at the same time, then the re-cacluated window will still be 0, something that happens if you send messages over <session-capacity> in size. In short, if the local max-frame-size is being set, then the incoming session window behaviour seems basically broken for messages over <session-capacity>. If so, this likely hasnt been noticed for the most part because few are setting max-frame-size. EDIT: fixed remote -> local for frame size usage.
          Hide
          gemmellr Robbie Gemmell added a comment - - edited

          Took long enough writing my reply / looking at the behaviour that I missed Rafi had already responded

          I agree that the fix seems better than what occurs now, as per above comments the rest seems separate. I think we should apply it before the 0.10 release, and raise a further issue for the 'messages larger than <session-capacity> while local max-frame-size is set' issue that can be done at some point after.

          (Also, would it be possible to reach into messenger and alter the session capacity currently, as a workaround?)

          EDIT: fixed remote -> local for frame size usage.

          Show
          gemmellr Robbie Gemmell added a comment - - edited Took long enough writing my reply / looking at the behaviour that I missed Rafi had already responded I agree that the fix seems better than what occurs now, as per above comments the rest seems separate. I think we should apply it before the 0.10 release, and raise a further issue for the 'messages larger than <session-capacity> while local max-frame-size is set' issue that can be done at some point after. (Also, would it be possible to reach into messenger and alter the session capacity currently, as a workaround?) EDIT: fixed remote -> local for frame size usage.
          Hide
          kgiusti Ken Giusti added a comment -

          Thanks for the analysis - I agree that Gordon's fix is a good start and should be comitted.

          Haven't thought this through entirely, but for 0.10 would it make sense to add a temp workaround in messenger that automagically bumps the session->incoming_capacity from 1M to some huge number if the max-frame is configured? Would that avoid the session stall for 'reasonable' message sizes?

          Show
          kgiusti Ken Giusti added a comment - Thanks for the analysis - I agree that Gordon's fix is a good start and should be comitted. Haven't thought this through entirely, but for 0.10 would it make sense to add a temp workaround in messenger that automagically bumps the session->incoming_capacity from 1M to some huge number if the max-frame is configured? Would that avoid the session stall for 'reasonable' message sizes?
          Hide
          gemmellr Robbie Gemmell added a comment -

          I don't think there would be a need for that in hindsight. I overlooked that this should only have an effect if the Messenger has set a local maxFrameSize, which is what I think you changed it to do (with this also influencing the remote max frame size as seen by the other messenger instance). If it doesnt do that (as I don't think it does currently?) then I dont think it should have a problem (unless 2^31-1 transfer frames are recieved without a flow going in the opposite direction on the session), because the advertised incoming window will always be set to 2^31-1 independent of the frame size or number of transfer frames received.

          Show
          gemmellr Robbie Gemmell added a comment - I don't think there would be a need for that in hindsight. I overlooked that this should only have an effect if the Messenger has set a local maxFrameSize, which is what I think you changed it to do (with this also influencing the remote max frame size as seen by the other messenger instance). If it doesnt do that (as I don't think it does currently?) then I dont think it should have a problem (unless 2^31-1 transfer frames are recieved without a flow going in the opposite direction on the session), because the advertised incoming window will always be set to 2^31-1 independent of the frame size or number of transfer frames received.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 092d41d4453a8f5df0379a3ef87512596153df39 in qpid-proton's branch refs/heads/master from Robert Gemmell
          [ https://git-wip-us.apache.org/repos/asf?p=qpid-proton.git;h=092d41d ]

          PROTON-961: fix check for partial delivery, allow multi-frame messages to be received by Messenger

          Change suggested by Gordon Sim

          Show
          jira-bot ASF subversion and git services added a comment - Commit 092d41d4453a8f5df0379a3ef87512596153df39 in qpid-proton's branch refs/heads/master from Robert Gemmell [ https://git-wip-us.apache.org/repos/asf?p=qpid-proton.git;h=092d41d ] PROTON-961 : fix check for partial delivery, allow multi-frame messages to be received by Messenger Change suggested by Gordon Sim
          Hide
          gemmellr Robbie Gemmell added a comment -

          I've checked in Gordons suggested fix.

          It passes all the existing tests and I verified it fixed the problem by using the 'recv' Messenger example against the Java broker, sending it 2MB messages from the JMS client followed by small messges from the 'send' Messenger example, and checking that the frames on the wire were as expected using a protocol trace. As noted earlier, the incoming capacity starvation issue will only occur if a local max-frame-size is in effect, and Messenger doesnt do that or let you configure it yourself, so the 2MB messages worked fine despite the 1MB session capacity.

          I tried adding a new test and have given up. You cant use Messenger on both sides, because it wont fragment sent messages unless a remote max framse size is set, and you cant configure it to do that. I then tried using the Reactor/Container bits on one side to get the effect, and couldn't see how to do what I wanted there either. If someone can point a simple way of doing it I'll add one.

          Show
          gemmellr Robbie Gemmell added a comment - I've checked in Gordons suggested fix. It passes all the existing tests and I verified it fixed the problem by using the 'recv' Messenger example against the Java broker, sending it 2MB messages from the JMS client followed by small messges from the 'send' Messenger example, and checking that the frames on the wire were as expected using a protocol trace. As noted earlier, the incoming capacity starvation issue will only occur if a local max-frame-size is in effect, and Messenger doesnt do that or let you configure it yourself, so the 2MB messages worked fine despite the 1MB session capacity. I tried adding a new test and have given up. You cant use Messenger on both sides, because it wont fragment sent messages unless a remote max framse size is set, and you cant configure it to do that. I then tried using the Reactor/Container bits on one side to get the effect, and couldn't see how to do what I wanted there either. If someone can point a simple way of doing it I'll add one.
          Hide
          philpreston Philip Preston added a comment -

          I'll have a test with the original OpenMAMA work where I found this.

          Many thanks for help

          Phil

          Show
          philpreston Philip Preston added a comment - I'll have a test with the original OpenMAMA work where I found this. Many thanks for help Phil

            People

            • Assignee:
              Unassigned
              Reporter:
              gsim Gordon Sim
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development