Uploaded image for project: 'James Server'
  1. James Server
  2. JAMES-3599

Improve the design of the RabbitMQ eventbus

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Major
    • Resolution: Done
    • 3.6.0
    • 3.7.0
    • mailbox, rabbitmq
    • None

    Description

      Mailing list discussion: https://www.mail-archive.com/server-dev@james.apache.org/msg70437.html

      I did spend a bit of time digging within the RabbitMQ performances and
      stability.

      I was surprised to discover weeks ago the amount of work performed by
      play.json library and could not just quite explain why it was hogging 3%
      of CPU time, and be the most CPU consumer for mailbox events. RabbitMQ
      acks account for another 1.20% of CPU time.

      Investigating in the RabbitMQ eventbus I realized the events are routed
      to all group queues, dispatched and deserialized then applied if relevant.

      Given 200 events/s and given that the JMAP server has 10 groups we end
      up deserializing 2000 events/s, even if irrelevant for the groups.

      As I recall, we wanted the the event per group to be the unit of retry.
      Noble design goal.

      I think parallelizing groups is a non goal: this kind of optimization
      would not improve response time as it is asynchronous, running in the
      background, and makes little sense at 1000s requests per seconds.

      However ending up having one queue per event is likely sub-optimal. I
      think the design can be improved by, in the nominal case, transmitting
      only one message to all groups. The receiving groups will then try to
      execute all groups. We can keep reties for individual groups (with their
      dedicated exchanges and queues): upon failure, we republish to the retry
      exchange of the incriminated listener. This makes the upgrade path easy
      too, as the group queue keeps being consumed. One would just need to do
      some unbindings...

      Note that such an evolution would:

      • also enable us, if we want, to enforce some execution orders for
        listeners, opening the way to fix things like JAMES-3561
        <https://issues.apache.org/jira/browse/JAMES-3561> ...
      • it could serve as an inspiration for future eventBus implementations
        like the Pulsar one, hence getting feedback on the existing design is
        IMO useful.

      I will create a JIRA ticket holding the design proposal (schema) and how
      it does defer from the previous one, as well as some RabbitMQ management
      screenshots.

      Attachments

        1. design_after.png
          222 kB
          Benoit Tellier
        2. design_before.png
          183 kB
          Benoit Tellier
        3. rabbitmq-management.png
          230 kB
          Benoit Tellier

        Issue Links

          Activity

            People

              Unassigned Unassigned
              btellier Benoit Tellier
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h