Qpid Proton
  1. Qpid Proton
  2. PROTON-401

Ordering issue prevents credit drain from working properly

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5
    • Fix Version/s: 0.6
    • Component/s: proton-c
    • Labels:
      None

      Description

      If the sending link calls pn_link_drained() to indicate that it has send all pending data, and afterwards it receives a Flow frame with drain=true from the peer, then the drain never completes.

      The ordering is the problem: if the flow frame w/drain=true is received BEFORE the sender calls pn_link_drained(), then it works.

      1. drain-error.patch
        2 kB
        Ken Giusti
      2. drain-hack.patch
        3 kB
        Ken Giusti

        Issue Links

          Activity

          Hide
          Ted Ross added a comment -

          The problem has been solved by using the pn_link_get_drain() query to determine when the drain mode has changed.

          Show
          Ted Ross added a comment - The problem has been solved by using the pn_link_get_drain() query to determine when the drain mode has changed.
          Hide
          Ted Ross added a comment -

          If there is a way for the Engine user to determine the drain mode of a sender, I believe I can solve my problem.

          Show
          Ted Ross added a comment - If there is a way for the Engine user to determine the drain mode of a sender, I believe I can solve my problem.
          Hide
          Ted Ross added a comment -

          My code only calls pn_link_drained once when the output buffer empties. Because of this issue, subsequent drain requests (like from qpid::messaging::Receiver::fetch) are not responded to and hang (even if there is a timeout).

          Since I'm doing the full output processing for every IO event, it's possible that I may have a relatively easy workaround for this. Currently, I don't always re-process the connector in proton after calling pn_link_drained.

          Show
          Ted Ross added a comment - My code only calls pn_link_drained once when the output buffer empties. Because of this issue, subsequent drain requests (like from qpid::messaging::Receiver::fetch) are not responded to and hang (even if there is a timeout). Since I'm doing the full output processing for every IO event, it's possible that I may have a relatively easy workaround for this. Currently, I don't always re-process the connector in proton after calling pn_link_drained.
          Hide
          Rafael H. Schloming added a comment -

          I'll take a look as soon as I wrap up the JIRA I'm currently working on.

          BTW, what do you mean by "hammer"? Is your code actually busy looping, or is it simply having to do a linear search whenever the connection has I/O?

          Show
          Rafael H. Schloming added a comment - I'll take a look as soon as I wrap up the JIRA I'm currently working on. BTW, what do you mean by "hammer"? Is your code actually busy looping, or is it simply having to do a linear search whenever the connection has I/O?
          Hide
          Ted Ross added a comment -

          That works for me. I just don't want to "poll" the drain in case someone might care.

          The way it is now, a broker/intermediary that is idle with no enqueued messages needs to hammer away at pn_link_drained for each outgoing link.

          qpid::messaging::fetch now hangs when receiving from Dispatch because of this. When can we get a fix?

          Show
          Ted Ross added a comment - That works for me. I just don't want to "poll" the drain in case someone might care. The way it is now, a broker/intermediary that is idle with no enqueued messages needs to hammer away at pn_link_drained for each outgoing link. qpid::messaging::fetch now hangs when receiving from Dispatch because of this. When can we get a fix?
          Hide
          Rafael H. Schloming added a comment -

          I can see how the level set scenario Ted describes could work. My concern, however, is that it would force a possibly undesirable amount of coupling between sources of messages and their consumption point. In the particular case of the queue scenario it seems like the head and the tail of the queue would be forced to be synchronized. In the more general case, it's just hard to know whether it would always be possible to call pn_link_offered in time.

          Another option that might be worth considering is adding the notion of "interesting conditions" at the link level along with a query API at the connection level to allow an application efficiently navigate to the set of interesting links. I believe we have considered doing this in the past and deferred it because we found ways to work around the need for it, but it strikes me as a generally useful mechanism that might improve both this and other scenarios. I'm thinking an application could configure the link to say "I'm interested in knowing when you have positive credit", or "I'm interested in knowing when you get a drain request". I would think with this kind of mechanism the application could call drained when it is needed by the current semantics, and do it in an efficient way. I think this would also allow some other very useful patterns, e.g. allowing an application to efficiently divert messages to whatever links have available credit.

          Show
          Rafael H. Schloming added a comment - I can see how the level set scenario Ted describes could work. My concern, however, is that it would force a possibly undesirable amount of coupling between sources of messages and their consumption point. In the particular case of the queue scenario it seems like the head and the tail of the queue would be forced to be synchronized. In the more general case, it's just hard to know whether it would always be possible to call pn_link_offered in time. Another option that might be worth considering is adding the notion of "interesting conditions" at the link level along with a query API at the connection level to allow an application efficiently navigate to the set of interesting links. I believe we have considered doing this in the past and deferred it because we found ways to work around the need for it, but it strikes me as a generally useful mechanism that might improve both this and other scenarios. I'm thinking an application could configure the link to say "I'm interested in knowing when you have positive credit", or "I'm interested in knowing when you get a drain request". I would think with this kind of mechanism the application could call drained when it is needed by the current semantics, and do it in an efficient way. I think this would also allow some other very useful patterns, e.g. allowing an application to efficiently divert messages to whatever links have available credit.
          Hide
          Ken Giusti added a comment -

          For Rafi's consideration.

          Show
          Ken Giusti added a comment - For Rafi's consideration.
          Hide
          Ted Ross added a comment -

          To put the proposal in the context of Rafi's scenario:

          If the queue implementation invokes pn_link_offered on the outbound link prior to accepting the message on the inbound link, the strict semantics will be met.

          Show
          Ted Ross added a comment - To put the proposal in the context of Rafi's scenario: If the queue implementation invokes pn_link_offered on the outbound link prior to accepting the message on the inbound link, the strict semantics will be met.
          Hide
          Ted Ross added a comment -

          I propose that proton consider the 'drained' state of the application to be level-sensitive and indicated by calls to 'drained' and 'offered'. Whenever the application has at least one message to send, it will call pn_link_offered to indicate how many messages are ready to go. Once the last message has been sent, the application calls 'drained' to indicate that there are no more messages for now (and until a subsequent call to offered).

          Note that we could remove the pn_link_drained call altogether and replace it with an offered value of zero.

          Show
          Ted Ross added a comment - I propose that proton consider the 'drained' state of the application to be level-sensitive and indicated by calls to 'drained' and 'offered'. Whenever the application has at least one message to send, it will call pn_link_offered to indicate how many messages are ready to go. Once the last message has been sent, the application calls 'drained' to indicate that there are no more messages for now (and until a subsequent call to offered). Note that we could remove the pn_link_drained call altogether and replace it with an offered value of zero.
          Hide
          Ted Ross added a comment -

          I don't believe the proposed solution really works.

          An intermediary with many outgoing links will call 'drained' when the link's fifo/queue is empty (i.e. the last message was just sent). It will subsequently call 'offered' when a new message is enqueued.

          In order to call 'drained' more frequently, the process would have to use the writability of the link's connection as a trigger. This will lead to very high CPU utilization calling 'drained' repeatedly or will force the developer to compromise with some complicated form of hold-off.

          Show
          Ted Ross added a comment - I don't believe the proposed solution really works. An intermediary with many outgoing links will call 'drained' when the link's fifo/queue is empty (i.e. the last message was just sent). It will subsequently call 'offered' when a new message is enqueued. In order to call 'drained' more frequently, the process would have to use the writability of the link's connection as a trigger. This will lead to very high CPU utilization calling 'drained' repeatedly or will force the developer to compromise with some complicated form of hold-off.
          Hide
          Ken Giusti added a comment -

          I see your point - I'll update the PROTON-200 patch to call drained as you explain in your comment.

          Show
          Ken Giusti added a comment - I see your point - I'll update the PROTON-200 patch to call drained as you explain in your comment.
          Hide
          Rafael H. Schloming added a comment -

          After looking at the proposed patch and tests, I believe that this isn't really a bug but rather a missunderstanding of how the API is supposed to behave. The patch is effectively modifying the engine to remember that the application previously indicated that it was drained.This will, however, introduce race conditions since the engine can't actually know if the application is currently drained or simply was drained at some point in the past.

          To understand the semantics, it helps to imagine an application that has implemented a message queue exposed via AMQP at both ends, with strict semantics regarding "emptiness". In such a scenario, you should be able to push a message onto the tail of the queue, verify that it is accepted, and then drain from the head of the queue and be guaranteed of receiving at least one message. With this patch you would no longer have that guarantee. You would have a race condition where the engine remembers the queue used to be empty and drains the credit even though there is a message to supply.

          I believe the fix for the scenario is for the application to call drained more frequently, i.e. whenever it wakes up and there is positive credit on the link and no messages to send.

          Show
          Rafael H. Schloming added a comment - After looking at the proposed patch and tests, I believe that this isn't really a bug but rather a missunderstanding of how the API is supposed to behave. The patch is effectively modifying the engine to remember that the application previously indicated that it was drained.This will, however, introduce race conditions since the engine can't actually know if the application is currently drained or simply was drained at some point in the past. To understand the semantics, it helps to imagine an application that has implemented a message queue exposed via AMQP at both ends, with strict semantics regarding "emptiness". In such a scenario, you should be able to push a message onto the tail of the queue, verify that it is accepted, and then drain from the head of the queue and be guaranteed of receiving at least one message. With this patch you would no longer have that guarantee. You would have a race condition where the engine remembers the queue used to be empty and drains the credit even though there is a message to supply. I believe the fix for the scenario is for the application to call drained more frequently, i.e. whenever it wakes up and there is positive credit on the link and no messages to send.
          Hide
          Ken Giusti added a comment -
          Show
          Ken Giusti added a comment - Reviewboard link: https://reviews.apache.org/r/13652/
          Hide
          Ken Giusti added a comment -

          Attached drain-hack.patch.

          This is a hack I've used to work around the problem for now. Not formal - causes some unit test failures.

          Show
          Ken Giusti added a comment - Attached drain-hack.patch. This is a hack I've used to work around the problem for now. Not formal - causes some unit test failures.
          Hide
          Ken Giusti added a comment -

          A python test that highlights the problem.

          Turn on frame tracing first:

          export PN_TRACE_FRM=1

          then run the test. At the point where the receiver issues the first drain, no flow is sent as a reply by the sender.

          Show
          Ken Giusti added a comment - A python test that highlights the problem. Turn on frame tracing first: export PN_TRACE_FRM=1 then run the test. At the point where the receiver issues the first drain, no flow is sent as a reply by the sender.

            People

            • Assignee:
              Rafael H. Schloming
              Reporter:
              Ken Giusti
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development