Qpid
  1. Qpid
  2. QPID-3875

Reduce processing overhead when updating multiple statistic counters

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: 0.16
    • Fix Version/s: 0.16
    • Component/s: C++ Broker
    • Labels:
      None

      Description

      While running cachegrind analysis of a simple traffic flow across the broker, I noticed that fetching the per-thread statistics had an overall instruction cost that is surprisingly long.

        Activity

        Justin Ross made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Ken Giusti made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 0.16 [ 12319870 ]
        Fix Version/s Future [ 12315490 ]
        Resolution Fixed [ 1 ]
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4107/#review5465
        -----------------------------------------------------------

        Ship it!

        • Andrew

        On 2012-02-29 18:42:59, Kenneth Giusti wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4107/

        -----------------------------------------------------------

        (Updated 2012-02-29 18:42:59)

        Review request for qpid, Andrew Stitcher, Gordon Sim, and Ted Ross.

        Summary

        -------

        For each message received or sent, the broker needs to increment at least 4 statistic counters - # of received/send messages, and # of received/sent bytes. For each counter accessed, a call to thread-local storage is made. This patch reduces the number of calls to thread local storage by allowing direct access to the full complement of per thread counters.

        On my cpu-challenged laptop, this resulted in a minor speedup in msgs/sec across a shared queue (2 senders, 1 receiver):

        TRUNK

        msg/sec

        Sender1 : Sender2 : Receiver

        25825 : 25359 : 43805

        25251 : 25159 : 43379

        24564 : 24211 : 38836

        PATCHED

        msg/sec

        Sender1 : Sender2 : Receiver

        26015 : 25849 : 44473

        26197 : 25765 : 44291

        25826 : 25476 : 43904

        This addresses bug qpid-3875.

        https://issues.apache.org/jira/browse/qpid-3875

        Diffs

        -----

        /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1295123

        /trunk/qpid/cpp/src/qpid/broker/Queue.h 1295123

        /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1295123

        /trunk/qpid/cpp/managementgen/qmfgen/templates/Class.h 1295123

        /trunk/qpid/cpp/src/qpid/broker/Connection.cpp 1295123

        Diff: https://reviews.apache.org/r/4107/diff

        Testing

        -------

        make check

        Thanks,

        Kenneth

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4107/#review5465 ----------------------------------------------------------- Ship it! Andrew On 2012-02-29 18:42:59, Kenneth Giusti wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4107/ ----------------------------------------------------------- (Updated 2012-02-29 18:42:59) Review request for qpid, Andrew Stitcher, Gordon Sim, and Ted Ross. Summary ------- For each message received or sent, the broker needs to increment at least 4 statistic counters - # of received/send messages, and # of received/sent bytes. For each counter accessed, a call to thread-local storage is made. This patch reduces the number of calls to thread local storage by allowing direct access to the full complement of per thread counters. On my cpu-challenged laptop, this resulted in a minor speedup in msgs/sec across a shared queue (2 senders, 1 receiver): TRUNK msg/sec Sender1 : Sender2 : Receiver 25825 : 25359 : 43805 25251 : 25159 : 43379 24564 : 24211 : 38836 PATCHED msg/sec Sender1 : Sender2 : Receiver 26015 : 25849 : 44473 26197 : 25765 : 44291 25826 : 25476 : 43904 This addresses bug qpid-3875. https://issues.apache.org/jira/browse/qpid-3875 Diffs ----- /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1295123 /trunk/qpid/cpp/src/qpid/broker/Queue.h 1295123 /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1295123 /trunk/qpid/cpp/managementgen/qmfgen/templates/Class.h 1295123 /trunk/qpid/cpp/src/qpid/broker/Connection.cpp 1295123 Diff: https://reviews.apache.org/r/4107/diff Testing ------- make check Thanks, Kenneth
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4107/#review5464
        -----------------------------------------------------------

        Ship it!

        I don't love this because it bypasses the "api" for management objects. But... I can't think of anything better. And because we're implementing and instrumenting a fast-path, it is in some cases appropriate to design a "layer-violation" such as this for the sake of performance.

        • Ted

        On 2012-02-29 18:42:59, Kenneth Giusti wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4107/

        -----------------------------------------------------------

        (Updated 2012-02-29 18:42:59)

        Review request for qpid, Andrew Stitcher, Gordon Sim, and Ted Ross.

        Summary

        -------

        For each message received or sent, the broker needs to increment at least 4 statistic counters - # of received/send messages, and # of received/sent bytes. For each counter accessed, a call to thread-local storage is made. This patch reduces the number of calls to thread local storage by allowing direct access to the full complement of per thread counters.

        On my cpu-challenged laptop, this resulted in a minor speedup in msgs/sec across a shared queue (2 senders, 1 receiver):

        TRUNK

        msg/sec

        Sender1 : Sender2 : Receiver

        25825 : 25359 : 43805

        25251 : 25159 : 43379

        24564 : 24211 : 38836

        PATCHED

        msg/sec

        Sender1 : Sender2 : Receiver

        26015 : 25849 : 44473

        26197 : 25765 : 44291

        25826 : 25476 : 43904

        This addresses bug qpid-3875.

        https://issues.apache.org/jira/browse/qpid-3875

        Diffs

        -----

        /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1295123

        /trunk/qpid/cpp/src/qpid/broker/Queue.h 1295123

        /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1295123

        /trunk/qpid/cpp/managementgen/qmfgen/templates/Class.h 1295123

        /trunk/qpid/cpp/src/qpid/broker/Connection.cpp 1295123

        Diff: https://reviews.apache.org/r/4107/diff

        Testing

        -------

        make check

        Thanks,

        Kenneth

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4107/#review5464 ----------------------------------------------------------- Ship it! I don't love this because it bypasses the "api" for management objects. But... I can't think of anything better. And because we're implementing and instrumenting a fast-path, it is in some cases appropriate to design a "layer-violation" such as this for the sake of performance. Ted On 2012-02-29 18:42:59, Kenneth Giusti wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4107/ ----------------------------------------------------------- (Updated 2012-02-29 18:42:59) Review request for qpid, Andrew Stitcher, Gordon Sim, and Ted Ross. Summary ------- For each message received or sent, the broker needs to increment at least 4 statistic counters - # of received/send messages, and # of received/sent bytes. For each counter accessed, a call to thread-local storage is made. This patch reduces the number of calls to thread local storage by allowing direct access to the full complement of per thread counters. On my cpu-challenged laptop, this resulted in a minor speedup in msgs/sec across a shared queue (2 senders, 1 receiver): TRUNK msg/sec Sender1 : Sender2 : Receiver 25825 : 25359 : 43805 25251 : 25159 : 43379 24564 : 24211 : 38836 PATCHED msg/sec Sender1 : Sender2 : Receiver 26015 : 25849 : 44473 26197 : 25765 : 44291 25826 : 25476 : 43904 This addresses bug qpid-3875. https://issues.apache.org/jira/browse/qpid-3875 Diffs ----- /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1295123 /trunk/qpid/cpp/src/qpid/broker/Queue.h 1295123 /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1295123 /trunk/qpid/cpp/managementgen/qmfgen/templates/Class.h 1295123 /trunk/qpid/cpp/src/qpid/broker/Connection.cpp 1295123 Diff: https://reviews.apache.org/r/4107/diff Testing ------- make check Thanks, Kenneth
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4107/#review5463
        -----------------------------------------------------------

        Ship it!

        (For consistency it might be nice to have the txn stats (txnMsgEnqueues/Dequeues) use the same mechanism?)

        • Gordon

        On 2012-02-29 18:42:59, Kenneth Giusti wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4107/

        -----------------------------------------------------------

        (Updated 2012-02-29 18:42:59)

        Review request for qpid, Andrew Stitcher, Gordon Sim, and Ted Ross.

        Summary

        -------

        For each message received or sent, the broker needs to increment at least 4 statistic counters - # of received/send messages, and # of received/sent bytes. For each counter accessed, a call to thread-local storage is made. This patch reduces the number of calls to thread local storage by allowing direct access to the full complement of per thread counters.

        On my cpu-challenged laptop, this resulted in a minor speedup in msgs/sec across a shared queue (2 senders, 1 receiver):

        TRUNK

        msg/sec

        Sender1 : Sender2 : Receiver

        25825 : 25359 : 43805

        25251 : 25159 : 43379

        24564 : 24211 : 38836

        PATCHED

        msg/sec

        Sender1 : Sender2 : Receiver

        26015 : 25849 : 44473

        26197 : 25765 : 44291

        25826 : 25476 : 43904

        This addresses bug qpid-3875.

        https://issues.apache.org/jira/browse/qpid-3875

        Diffs

        -----

        /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1295123

        /trunk/qpid/cpp/src/qpid/broker/Queue.h 1295123

        /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1295123

        /trunk/qpid/cpp/managementgen/qmfgen/templates/Class.h 1295123

        /trunk/qpid/cpp/src/qpid/broker/Connection.cpp 1295123

        Diff: https://reviews.apache.org/r/4107/diff

        Testing

        -------

        make check

        Thanks,

        Kenneth

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4107/#review5463 ----------------------------------------------------------- Ship it! (For consistency it might be nice to have the txn stats (txnMsgEnqueues/Dequeues) use the same mechanism?) Gordon On 2012-02-29 18:42:59, Kenneth Giusti wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4107/ ----------------------------------------------------------- (Updated 2012-02-29 18:42:59) Review request for qpid, Andrew Stitcher, Gordon Sim, and Ted Ross. Summary ------- For each message received or sent, the broker needs to increment at least 4 statistic counters - # of received/send messages, and # of received/sent bytes. For each counter accessed, a call to thread-local storage is made. This patch reduces the number of calls to thread local storage by allowing direct access to the full complement of per thread counters. On my cpu-challenged laptop, this resulted in a minor speedup in msgs/sec across a shared queue (2 senders, 1 receiver): TRUNK msg/sec Sender1 : Sender2 : Receiver 25825 : 25359 : 43805 25251 : 25159 : 43379 24564 : 24211 : 38836 PATCHED msg/sec Sender1 : Sender2 : Receiver 26015 : 25849 : 44473 26197 : 25765 : 44291 25826 : 25476 : 43904 This addresses bug qpid-3875. https://issues.apache.org/jira/browse/qpid-3875 Diffs ----- /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1295123 /trunk/qpid/cpp/src/qpid/broker/Queue.h 1295123 /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1295123 /trunk/qpid/cpp/managementgen/qmfgen/templates/Class.h 1295123 /trunk/qpid/cpp/src/qpid/broker/Connection.cpp 1295123 Diff: https://reviews.apache.org/r/4107/diff Testing ------- make check Thanks, Kenneth
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4107/
        -----------------------------------------------------------

        Review request for qpid, Andrew Stitcher, Gordon Sim, and Ted Ross.

        Summary
        -------

        For each message received or sent, the broker needs to increment at least 4 statistic counters - # of received/send messages, and # of received/sent bytes. For each counter accessed, a call to thread-local storage is made. This patch reduces the number of calls to thread local storage by allowing direct access to the full complement of per thread counters.

        On my cpu-challenged laptop, this resulted in a minor speedup in msgs/sec across a shared queue (2 senders, 1 receiver):

        TRUNK
        msg/sec
        Sender1 : Sender2 : Receiver
        25825 : 25359 : 43805
        25251 : 25159 : 43379
        24564 : 24211 : 38836

        PATCHED
        msg/sec
        Sender1 : Sender2 : Receiver
        26015 : 25849 : 44473
        26197 : 25765 : 44291
        25826 : 25476 : 43904

        This addresses bug qpid-3875.
        https://issues.apache.org/jira/browse/qpid-3875

        Diffs


        /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1295123
        /trunk/qpid/cpp/src/qpid/broker/Queue.h 1295123
        /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1295123
        /trunk/qpid/cpp/managementgen/qmfgen/templates/Class.h 1295123
        /trunk/qpid/cpp/src/qpid/broker/Connection.cpp 1295123

        Diff: https://reviews.apache.org/r/4107/diff

        Testing
        -------

        make check

        Thanks,

        Kenneth

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4107/ ----------------------------------------------------------- Review request for qpid, Andrew Stitcher, Gordon Sim, and Ted Ross. Summary ------- For each message received or sent, the broker needs to increment at least 4 statistic counters - # of received/send messages, and # of received/sent bytes. For each counter accessed, a call to thread-local storage is made. This patch reduces the number of calls to thread local storage by allowing direct access to the full complement of per thread counters. On my cpu-challenged laptop, this resulted in a minor speedup in msgs/sec across a shared queue (2 senders, 1 receiver): TRUNK msg/sec Sender1 : Sender2 : Receiver 25825 : 25359 : 43805 25251 : 25159 : 43379 24564 : 24211 : 38836 PATCHED msg/sec Sender1 : Sender2 : Receiver 26015 : 25849 : 44473 26197 : 25765 : 44291 25826 : 25476 : 43904 This addresses bug qpid-3875. https://issues.apache.org/jira/browse/qpid-3875 Diffs /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1295123 /trunk/qpid/cpp/src/qpid/broker/Queue.h 1295123 /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1295123 /trunk/qpid/cpp/managementgen/qmfgen/templates/Class.h 1295123 /trunk/qpid/cpp/src/qpid/broker/Connection.cpp 1295123 Diff: https://reviews.apache.org/r/4107/diff Testing ------- make check Thanks, Kenneth
        Ken Giusti made changes -
        Field Original Value New Value
        Attachment CG-statspatch.tgz [ 12516581 ]
        Hide
        Ken Giusti added a comment -

        Attaching the cachegrind analysis for both pre and post patch.

        Notice the overall inclusive time spent in qpid::management::ManagementObject::getThreadIndex().

        Show
        Ken Giusti added a comment - Attaching the cachegrind analysis for both pre and post patch. Notice the overall inclusive time spent in qpid::management::ManagementObject::getThreadIndex().
        Ken Giusti created issue -

          People

          • Assignee:
            Ken Giusti
            Reporter:
            Ken Giusti
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development