Qpid
  1. Qpid
  2. QPID-3875

Reduce processing overhead when updating multiple statistic counters

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: 0.16
    • Fix Version/s: 0.16
    • Component/s: C++ Broker
    • Labels:
      None

      Description

      While running cachegrind analysis of a simple traffic flow across the broker, I noticed that fetching the per-thread statistics had an overall instruction cost that is surprisingly long.

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        2d 3h 57m 1 Ken Giusti 02/Mar/12 20:53
        Resolved Resolved Closed Closed
        513d 22h 12m 1 Justin Ross 29/Jul/13 20:06
        Justin Ross made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Ken Giusti made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 0.16 [ 12319870 ]
        Fix Version/s Future [ 12315490 ]
        Resolution Fixed [ 1 ]
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4107/#review5465
        -----------------------------------------------------------

        Ship it!

        • Andrew

        On 2012-02-29 18:42:59, Kenneth Giusti wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4107/

        -----------------------------------------------------------

        (Updated 2012-02-29 18:42:59)

        Review request for qpid, Andrew Stitcher, Gordon Sim, and Ted Ross.

        Summary

        -------

        For each message received or sent, the broker needs to increment at least 4 statistic counters - # of received/send messages, and # of received/sent bytes. For each counter accessed, a call to thread-local storage is made. This patch reduces the number of calls to thread local storage by allowing direct access to the full complement of per thread counters.

        On my cpu-challenged laptop, this resulted in a minor speedup in msgs/sec across a shared queue (2 senders, 1 receiver):

        TRUNK

        msg/sec

        Sender1 : Sender2 : Receiver

        25825 : 25359 : 43805

        25251 : 25159 : 43379

        24564 : 24211 : 38836

        PATCHED

        msg/sec

        Sender1 : Sender2 : Receiver

        26015 : 25849 : 44473

        26197 : 25765 : 44291

        25826 : 25476 : 43904

        This addresses bug qpid-3875.

        https://issues.apache.org/jira/browse/qpid-3875

        Diffs

        -----

        /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1295123

        /trunk/qpid/cpp/src/qpid/broker/Queue.h 1295123

        /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1295123

        /trunk/qpid/cpp/managementgen/qmfgen/templates/Class.h 1295123

        /trunk/qpid/cpp/src/qpid/broker/Connection.cpp 1295123

        Diff: https://reviews.apache.org/r/4107/diff

        Testing

        -------

        make check

        Thanks,

        Kenneth

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4107/#review5465 ----------------------------------------------------------- Ship it! Andrew On 2012-02-29 18:42:59, Kenneth Giusti wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4107/ ----------------------------------------------------------- (Updated 2012-02-29 18:42:59) Review request for qpid, Andrew Stitcher, Gordon Sim, and Ted Ross. Summary ------- For each message received or sent, the broker needs to increment at least 4 statistic counters - # of received/send messages, and # of received/sent bytes. For each counter accessed, a call to thread-local storage is made. This patch reduces the number of calls to thread local storage by allowing direct access to the full complement of per thread counters. On my cpu-challenged laptop, this resulted in a minor speedup in msgs/sec across a shared queue (2 senders, 1 receiver): TRUNK msg/sec Sender1 : Sender2 : Receiver 25825 : 25359 : 43805 25251 : 25159 : 43379 24564 : 24211 : 38836 PATCHED msg/sec Sender1 : Sender2 : Receiver 26015 : 25849 : 44473 26197 : 25765 : 44291 25826 : 25476 : 43904 This addresses bug qpid-3875. https://issues.apache.org/jira/browse/qpid-3875 Diffs ----- /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1295123 /trunk/qpid/cpp/src/qpid/broker/Queue.h 1295123 /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1295123 /trunk/qpid/cpp/managementgen/qmfgen/templates/Class.h 1295123 /trunk/qpid/cpp/src/qpid/broker/Connection.cpp 1295123 Diff: https://reviews.apache.org/r/4107/diff Testing ------- make check Thanks, Kenneth
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4107/#review5464
        -----------------------------------------------------------

        Ship it!

        I don't love this because it bypasses the "api" for management objects. But... I can't think of anything better. And because we're implementing and instrumenting a fast-path, it is in some cases appropriate to design a "layer-violation" such as this for the sake of performance.

        • Ted

        On 2012-02-29 18:42:59, Kenneth Giusti wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4107/

        -----------------------------------------------------------

        (Updated 2012-02-29 18:42:59)

        Review request for qpid, Andrew Stitcher, Gordon Sim, and Ted Ross.

        Summary

        -------

        For each message received or sent, the broker needs to increment at least 4 statistic counters - # of received/send messages, and # of received/sent bytes. For each counter accessed, a call to thread-local storage is made. This patch reduces the number of calls to thread local storage by allowing direct access to the full complement of per thread counters.

        On my cpu-challenged laptop, this resulted in a minor speedup in msgs/sec across a shared queue (2 senders, 1 receiver):

        TRUNK

        msg/sec

        Sender1 : Sender2 : Receiver

        25825 : 25359 : 43805

        25251 : 25159 : 43379

        24564 : 24211 : 38836

        PATCHED

        msg/sec

        Sender1 : Sender2 : Receiver

        26015 : 25849 : 44473

        26197 : 25765 : 44291

        25826 : 25476 : 43904

        This addresses bug qpid-3875.

        https://issues.apache.org/jira/browse/qpid-3875

        Diffs

        -----

        /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1295123

        /trunk/qpid/cpp/src/qpid/broker/Queue.h 1295123

        /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1295123

        /trunk/qpid/cpp/managementgen/qmfgen/templates/Class.h 1295123

        /trunk/qpid/cpp/src/qpid/broker/Connection.cpp 1295123

        Diff: https://reviews.apache.org/r/4107/diff

        Testing

        -------

        make check

        Thanks,

        Kenneth

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4107/#review5464 ----------------------------------------------------------- Ship it! I don't love this because it bypasses the "api" for management objects. But... I can't think of anything better. And because we're implementing and instrumenting a fast-path, it is in some cases appropriate to design a "layer-violation" such as this for the sake of performance. Ted On 2012-02-29 18:42:59, Kenneth Giusti wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4107/ ----------------------------------------------------------- (Updated 2012-02-29 18:42:59) Review request for qpid, Andrew Stitcher, Gordon Sim, and Ted Ross. Summary ------- For each message received or sent, the broker needs to increment at least 4 statistic counters - # of received/send messages, and # of received/sent bytes. For each counter accessed, a call to thread-local storage is made. This patch reduces the number of calls to thread local storage by allowing direct access to the full complement of per thread counters. On my cpu-challenged laptop, this resulted in a minor speedup in msgs/sec across a shared queue (2 senders, 1 receiver): TRUNK msg/sec Sender1 : Sender2 : Receiver 25825 : 25359 : 43805 25251 : 25159 : 43379 24564 : 24211 : 38836 PATCHED msg/sec Sender1 : Sender2 : Receiver 26015 : 25849 : 44473 26197 : 25765 : 44291 25826 : 25476 : 43904 This addresses bug qpid-3875. https://issues.apache.org/jira/browse/qpid-3875 Diffs ----- /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1295123 /trunk/qpid/cpp/src/qpid/broker/Queue.h 1295123 /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1295123 /trunk/qpid/cpp/managementgen/qmfgen/templates/Class.h 1295123 /trunk/qpid/cpp/src/qpid/broker/Connection.cpp 1295123 Diff: https://reviews.apache.org/r/4107/diff Testing ------- make check Thanks, Kenneth
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4107/#review5463
        -----------------------------------------------------------

        Ship it!

        (For consistency it might be nice to have the txn stats (txnMsgEnqueues/Dequeues) use the same mechanism?)

        • Gordon

        On 2012-02-29 18:42:59, Kenneth Giusti wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4107/

        -----------------------------------------------------------

        (Updated 2012-02-29 18:42:59)

        Review request for qpid, Andrew Stitcher, Gordon Sim, and Ted Ross.

        Summary

        -------

        For each message received or sent, the broker needs to increment at least 4 statistic counters - # of received/send messages, and # of received/sent bytes. For each counter accessed, a call to thread-local storage is made. This patch reduces the number of calls to thread local storage by allowing direct access to the full complement of per thread counters.

        On my cpu-challenged laptop, this resulted in a minor speedup in msgs/sec across a shared queue (2 senders, 1 receiver):

        TRUNK

        msg/sec

        Sender1 : Sender2 : Receiver

        25825 : 25359 : 43805

        25251 : 25159 : 43379

        24564 : 24211 : 38836

        PATCHED

        msg/sec

        Sender1 : Sender2 : Receiver

        26015 : 25849 : 44473

        26197 : 25765 : 44291

        25826 : 25476 : 43904

        This addresses bug qpid-3875.

        https://issues.apache.org/jira/browse/qpid-3875

        Diffs

        -----

        /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1295123

        /trunk/qpid/cpp/src/qpid/broker/Queue.h 1295123

        /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1295123

        /trunk/qpid/cpp/managementgen/qmfgen/templates/Class.h 1295123

        /trunk/qpid/cpp/src/qpid/broker/Connection.cpp 1295123

        Diff: https://reviews.apache.org/r/4107/diff

        Testing

        -------

        make check

        Thanks,

        Kenneth

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4107/#review5463 ----------------------------------------------------------- Ship it! (For consistency it might be nice to have the txn stats (txnMsgEnqueues/Dequeues) use the same mechanism?) Gordon On 2012-02-29 18:42:59, Kenneth Giusti wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4107/ ----------------------------------------------------------- (Updated 2012-02-29 18:42:59) Review request for qpid, Andrew Stitcher, Gordon Sim, and Ted Ross. Summary ------- For each message received or sent, the broker needs to increment at least 4 statistic counters - # of received/send messages, and # of received/sent bytes. For each counter accessed, a call to thread-local storage is made. This patch reduces the number of calls to thread local storage by allowing direct access to the full complement of per thread counters. On my cpu-challenged laptop, this resulted in a minor speedup in msgs/sec across a shared queue (2 senders, 1 receiver): TRUNK msg/sec Sender1 : Sender2 : Receiver 25825 : 25359 : 43805 25251 : 25159 : 43379 24564 : 24211 : 38836 PATCHED msg/sec Sender1 : Sender2 : Receiver 26015 : 25849 : 44473 26197 : 25765 : 44291 25826 : 25476 : 43904 This addresses bug qpid-3875. https://issues.apache.org/jira/browse/qpid-3875 Diffs ----- /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1295123 /trunk/qpid/cpp/src/qpid/broker/Queue.h 1295123 /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1295123 /trunk/qpid/cpp/managementgen/qmfgen/templates/Class.h 1295123 /trunk/qpid/cpp/src/qpid/broker/Connection.cpp 1295123 Diff: https://reviews.apache.org/r/4107/diff Testing ------- make check Thanks, Kenneth
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4107/
        -----------------------------------------------------------

        Review request for qpid, Andrew Stitcher, Gordon Sim, and Ted Ross.

        Summary
        -------

        For each message received or sent, the broker needs to increment at least 4 statistic counters - # of received/send messages, and # of received/sent bytes. For each counter accessed, a call to thread-local storage is made. This patch reduces the number of calls to thread local storage by allowing direct access to the full complement of per thread counters.

        On my cpu-challenged laptop, this resulted in a minor speedup in msgs/sec across a shared queue (2 senders, 1 receiver):

        TRUNK
        msg/sec
        Sender1 : Sender2 : Receiver
        25825 : 25359 : 43805
        25251 : 25159 : 43379
        24564 : 24211 : 38836

        PATCHED
        msg/sec
        Sender1 : Sender2 : Receiver
        26015 : 25849 : 44473
        26197 : 25765 : 44291
        25826 : 25476 : 43904

        This addresses bug qpid-3875.
        https://issues.apache.org/jira/browse/qpid-3875

        Diffs


        /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1295123
        /trunk/qpid/cpp/src/qpid/broker/Queue.h 1295123
        /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1295123
        /trunk/qpid/cpp/managementgen/qmfgen/templates/Class.h 1295123
        /trunk/qpid/cpp/src/qpid/broker/Connection.cpp 1295123

        Diff: https://reviews.apache.org/r/4107/diff

        Testing
        -------

        make check

        Thanks,

        Kenneth

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4107/ ----------------------------------------------------------- Review request for qpid, Andrew Stitcher, Gordon Sim, and Ted Ross. Summary ------- For each message received or sent, the broker needs to increment at least 4 statistic counters - # of received/send messages, and # of received/sent bytes. For each counter accessed, a call to thread-local storage is made. This patch reduces the number of calls to thread local storage by allowing direct access to the full complement of per thread counters. On my cpu-challenged laptop, this resulted in a minor speedup in msgs/sec across a shared queue (2 senders, 1 receiver): TRUNK msg/sec Sender1 : Sender2 : Receiver 25825 : 25359 : 43805 25251 : 25159 : 43379 24564 : 24211 : 38836 PATCHED msg/sec Sender1 : Sender2 : Receiver 26015 : 25849 : 44473 26197 : 25765 : 44291 25826 : 25476 : 43904 This addresses bug qpid-3875. https://issues.apache.org/jira/browse/qpid-3875 Diffs /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1295123 /trunk/qpid/cpp/src/qpid/broker/Queue.h 1295123 /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1295123 /trunk/qpid/cpp/managementgen/qmfgen/templates/Class.h 1295123 /trunk/qpid/cpp/src/qpid/broker/Connection.cpp 1295123 Diff: https://reviews.apache.org/r/4107/diff Testing ------- make check Thanks, Kenneth
        Ken Giusti made changes -
        Field Original Value New Value
        Attachment CG-statspatch.tgz [ 12516581 ]
        Hide
        Ken Giusti added a comment -

        Attaching the cachegrind analysis for both pre and post patch.

        Notice the overall inclusive time spent in qpid::management::ManagementObject::getThreadIndex().

        Show
        Ken Giusti added a comment - Attaching the cachegrind analysis for both pre and post patch. Notice the overall inclusive time spent in qpid::management::ManagementObject::getThreadIndex().
        Ken Giusti created issue -

          People

          • Assignee:
            Ken Giusti
            Reporter:
            Ken Giusti
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development