Uploaded image for project: 'James Server'
  1. James Server
  2. JAMES-3777

Event sourcing - O[n²] storage for filters

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.7.0
    • 3.8.0
    • None
    • None

    Description

      Symptoms

      ```
      Largest Partitions:
      [FilteringRule/xxx@linagora.com] 44952069 (45.0 MB)
      ```

      Every time this guy sends an email we load 45 MB of JSON, which can yield big performance impact.

      What?

      We implemented event sourcing with reset. Given rule A, B if we want to persist rule C then we store a "reset to A, B, C" event.

      So, if we want to store N filter, the resulting structure with have a size depending of O[n²] which proves to be barely sustainable.

      How to fix

      Coming back to O[n] likely would help.

      Implement filter addition / removal both at the storage and JMAP layer

      Alternatives

      The read projection

      Currently we are loading the full history, building the aggregate each time we process emails, and performing SERIAL lightweight transactions. Which is very common. And impactfull.

      It would be possible to introduce read projection, maintained by a subscriber to the event source, that would allow efficiently reading current filters for a given user.

      This mean the history would be loaded only upon writes, which are rare.

      Impact: yet another table. Also the solution is local to this usage and does not help other event sourcing usages.

      Event sourcing snapshots

      Augment James event sourcing implementation with a Snapshot mechanism.

      Upon reading history, we would start reading available snapshots, then read the history from that snapshot.

      Event store would be responsible of taking snapshots. Even a one change out of 10 would do the job here.

      This implies being able to serialize state. This implies an additional table for storing event sourcing snapshots.

      My take on it: going `O[n2` -> `O[n]` will likely be a good enough mitigation that we don't need to grow the complexity of the event sourcing code.

      On the other hand, this ewould harden event sourcing code and likely lift most of the limitation for adoptions on the mailboxes write path (to enforce mailbox name unicity constraint).

      Note that both solutions are not exclusive.

      The dirty fix

      For filters the history prior reset event can be dropped, this can be used to solve the immediate problem, even if it is not very clean.

      Proposal

      • Implement a read projection
      • Implement addition / removal patches to filtering event sourcing aggregate
      • Don't implement event sourcing snapshots now

      And also... Remove the obligation to configure JMAP filtering mailet inside JMAP servers: after all this extension is not standard...

      Attachments

        Activity

          People

            Unassigned Unassigned
            btellier Benoit Tellier
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h 50m
                2h 50m